The Journal of 
Experimental Education 


A periodical oe report of scientific investigations relating to child development, 
learning, teaching, supervision, measurements, 
statistics, and experimental techniques. 


Volume XI 


MarcH, 1943 Number 3 


MEASUREMENTS, STATISTICS, AND METHODS OF 
EXPERIMENTAL RESEARCH 


CONTENTS 
Approximate Multiple Regression Weights: Robert W. B. Jackson _.._------- 


Characteristics of Kurtosis: Douglas E. Scates _. 


The Weighting of Tests Measuring the Same Function in Terms of Their Length: 
Teobaldo Casanova 238 


Machine Methods of Handling Large Classes: John G. Watkins _....------.- 
Fisher’s ¢-Test as A Special Case of His z-Test: P. J. Rulon -.-------------- 


The Validity of A Comprehensive Examination for Scholarship Awards in New 


$5.00 A YEAR PUBLISHED QUARTERLY $1.50 a Copy 


Edited and Published A. &. Barr, Professor of Education, University of Wisconsin, 
Madison, Wisconsin. 


Entered as second-class matter October 17, 1938 at th office at Madison, Wisconsin, 
ander the act of March $. 1879. 


4 2 
221 
4 
| 
245 
‘ 
| 


EDITORIAL BOARD 
A. S. Barr, Chairman, Professor of Education, University of Wisconsin, Madison, Wis. 


Carter V. Good, Professor of Education, Univer- 
sity of Cincinnati, Cincinnati, Ohio. Editorially 


responsible for materials on learning, teaching . 


and supervision, published each September. 

J. Wayne Wrightstone, Assistant Director, Bureau 
of Reference, Research and Statistics, Board of 
Education of the City of New York, 110 Liv- 
ingston Street, Brooklyn, New York. Editori- 
ally responsible for materials on 
construction, published each June. 


Edward E. Cureton, Senior Educational Statisti- 
cian, U. S. Office of Education, Washington, 
D. C. Editorially responsible for materials on 
measurements, statistics, and methods of exper- 
imental research, published each March. 

Arthur T. Jersild, Professor of Education, Teach- 
ers College, Columbia University, New York 
City. Editorially responsible for materials on 
child and development, pub- 
lished each December. 


CONTRIBUTING EDITORS 


Gilbert L. Betts, Supervisor of Graduate Research 
in Education, Colorado State College, Fort 
Collins, Colorado. 

William A. Brownell, Professor of Educational 
Pec ga Duke University, Durham, North 


Leo J. Brueckner, Professor of Education, Univer- 
sity of Minnesota, Minneapolis, Minnesota. 

Oscar K. Buros, Associate Professor of Education, 
Rutgers University, New Brunswick, New 


ersey. 

Otis W. Caldwell, General Secretary, The Amer- 
ican Association for the Advancement of Sci- 
ence, Boyce Thompson Institute for Plant 
Research, Yonkers, New York. 

Leslie L. Chisholm, Associate Professor of Educa- 
tion, State College of Washington, Pullman, 

Washington. 


Herbert S. Conrad, Associate Professor of Psy- 
chology, College of Agriculture, and Research 
Associate, Institute of Child Welfare, Univer- 
sity of California, Berkeley, California. 

Stephen M. Corey, Professor of Educational Psy- 
chology, University of Chicago, Chicago, Illinois. 

Robert A. Davis, Professor of Education, Director 
of Bureau of Educationa] Research, University 
of Colorado, Boulder, Colorado. 

Harl R. Douglass, Director of College of Educa- 
= University of Colorado, Boulder, Colo- 


Phony: Ww. Dunlap, Associate Professor of Educa- 
tional Psychology, University of Rochester, 
Rochester, New York. 

Harold A. Edgerton, Assistant Professor of Psy- 
chology, Ohio State University, Columbus, Ohio. 

Alvin C. Eurich, Professor of Education, Stanford 
University, Stanford University, California. 

John C. Flanagan, Assistant Chief, Research Sec- 
tion, Medical Division, Office, Chief of the Air 
Corps, War Department, Washington, D. C. 

Kai Jensen, Associate Professor of Education, 
University of Wisconsin, Madison, Wisconsin. 

Harold E. Jones, Professor of Psychology, Di- 
rector of Research, Institute of Child Welfare, 
University of California. Berkeley, California. 

Noel Keys, Professor of Education and Lecturer 
in Human Relations, University of California, 
Berkeley, California. 

Edward A. Lincoln, Consulting Psychologist, 
Halifax, Massachusetts. 

T. E. Newland, Pennsylvania State Department 
of Education, Chief of Special Education, Har- 
risburg, Pennsylvania. 

C. W. Odell, Associate Professor of Education, 
University of Illinois, Urbana, Illinois. 


Willard C. Olson, Professor of Education, Director 
of Research in Child Development, University 
x! Michigan, Ann Arbor, Michigan. 

W. E. Peik, Dean and Professor of Education, Uni- 
versity of Minnesota, Minneapolis, Minnesota. 

S. L. Pressey, Professor of Educational Psychol- 
ogy, Ohio State University, Columbus, Ohio. 

Clarence E. Ragsdale, Associate Professor of Edu- 
— University of Wisconsin, i Wis- 


H. H. Remmers, Director, Division of Educational 
Reference, Professor of Education and Psy- 
chology, Purdue University, Lafayette, Indiana. 

Henry D. Rinsland, Professor of Education and 
Director of Educational Research, The Univer- 
sity of Oklahoma, Norman, ©! 

Robert T. Rock, Jr., Professor of Pvchelies: 
Head of Dept. of Psychology, Graduate School, 
Fordham University, New York City. 

G. M. Ruch, Chief of Research and Statistical 
Service, U. S. Office of Education, Washington, 


P. J. Rulon, Assistant Professor of Education, 
Graduate School of Education, Harvard Uni- 
versity, Cambridge, Massachusetts, 

Douglas E. Scates, Associate Professor of Educa- 
tion, Duke University, Durham, North Carolina. 

David Segel, Educational Consultant, Specialist in 
Tests and Measurements, Federal Security 
Agency, U. S. Office of Education, Washington, 


Poul W. Terry, Professor of Educational Psy- 
chology, University of Alabama, University, 
Alabama. 


Helen Thompson, Clinic of Child Development, 
Research Associate, Yale University, New 
Haven, Connecticut. 

Robert L. Thorndike, Associate Professor of Edu- 
cation, Teachers College, Columbia University, 
New York City. 

Herbert A. Taps, Prolanter of Ohio 
State University, Columbus, Ohio. 

T. L. Torgerson, Professor of Education, Univer- 
sity of Wisconsin, Madison, Wisconsin. 

Helen M. Walker, Professor of Education, Teachers 
College, Columbia University, New York pre 6 

Beth L. Wellman, Professor of Psychology, Child 
Welfare Research Station, State University of 
Iowa, Iowa City, Iowa, 

Guy M. Wilson, Professor of Education, Boston 
University, Boston, Massachusetts. 

Paul A. Witty, Professor of Education. Director 
of Psycho-Educational Clinic, Schoo! of Educa- 
tion, Northwestern University, Evanston, 
Illinois. 

Ernest R. Wood, Professor of Education, New 
York University, New York City. 


— 
ae. 
€ 
ian 
‘ 
are 
| 
E 
Fas 
tex 
; 
5 
DEMOCRAT PRINTING COMPANY 
iB 


Journal of Experimental Education 


Volume XI 


MARCH, 1943 


Number 3 


APPROXIMATE MULTIPLE REGRESSION WEIGHTS 


Rosert W. B. JACKSON 
Department of Educational Research 
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In many problems in education we wish to 
combine a set of measurements in order to 
obtain the best possible estimates of the 
values of another variable, generally called 
the criterion variable. We may, for example, 
wish to predict the success of individuals in 
secondary schools from a knowledge of their 
grades in elementary schools, and/or from 
marks on objective achievement and intelli- 
gence tests. By following the careers of a 
group of students in secondary schools, we 
can determine a measure of their success. 
The problem then becomes one of determin- 
ing how to combine the marks obtained pre- 
viously so as to give the best possible predic- 
tion of this success. We shall assume in what 
follows that we have only one criterion vari- 
able, as it is in this form that the problem is 
most frequently found in practice. 

In most cases we are willing to assume 
that a simple weighted sum of the measure- 
ments will be sufficient. This is the case 
which will be considered in the present paper. 
If we denote by Y, the value of the criterion 
variable, and by X;, the value of the #th 
measurement of the #-th individual, respec- 
tively, then the simple weighted sum of the 
measurements, Y,’, may be written in the 
form: 


= do + + (1) 


where it is assumed that we have & measure- 
ments of each individual in addition to the 
value Y, of the criterion variable. In equa- 
tion (1), a is a constant, the b’s are the 
weights and are also constants; the values of 
these constants are to be determined from 
the available data. 

The method generally employed in prob- 
lems of -this kind, known as the multiple 
regression method, is based on the simple 
principle that the best estimates of the un- 


known constants of equation (1) are those 
which minimize the sum of squares of the 
differences between Y, and Y,’ i.e. which 
minimize the quantity 


N 


(2) 


where N denotes the total number of indi- 
viduals in the group considered. Substituting 
the value of Y,’ from equation (1) in equa- 
tion (2), differentiating the resulting equa- 
tion partially with respect to each of the 
unknown constants, setting these equations 
equal to zero, and simplifying, we have a set 
of & + 1 equations to solve for the set of 
k + 1 unknowns. Since the value of a, is 
easily found to be 


a, = —b,X, —... — (3) 


where the bars denote the mean values of the 
variables in question, we really have only k 
equations in & unknowns, i.e. the & equations 
involving the 5’s. Denoting the sums of 


oe and products from the means by 
ij, 1€. 


= 3 (Xie — Xs) (Xp — Xj) 

N (4) 
Siv= (Xi Xi) (¥. — VY) 

where i, j can take all values from 1 to &, the 
k equations may be written in the following 

form: 
+ = 


6,S,, + 
+ + + = Sey 


(5) 


+ Sux 


- 
| 
f 
i 
} 
| 
- 
f 
| 
221 
= 


222 7 JOURNAL OF EXPERIMENTAL EDUCATION 


Using determinants, we find that the solu- 
tion for the i-th regression weight is 


(6) 


where A(+o0) denotes the determinant 
formed by the coefficients of the 6’s, i.e. 


A= | Sux | (7) 


and A; denotes the determinant formed by 
replacing the i-th column of A by S,y, Szy 
Sxy, ie. 


Sx | (8) 


Although this method gives us the best 
possible prediction in the sense defined above, 
it involves a great deal of work when the 
number of measurements is at all large. To 
- find the values of the 6’s we must calculate 

k(k + 1) 
the : separate values of the S’s, and 
then solve the set of & simultaneous equa- 
tions. When & = 100, for example, there are 
_ 5,050 values of the S’s and 100 simultaneous 

equations to solve. This exact method is not 
practicable, therefore, in many problems and 
it is necessary to use approximate regression 
weights. The following sections will be de- 
voted to a discussion of certain approxima- 
tions which may profitably be employed. 

The reader may feel that an approximate 
solution can never be of much value, and 
that it will always be necessary to calculate 
the exact values of the weights. Experience 
has shown that this is not so. In many cases 
the approximate weights give results which 
are, for all practical purposes, just as good 
as the exact weights. We find for example, 
' that while the signs of the weights are very 
important, relatively large changes in the 
absolute values of the weights do not greatly 
affect the predictive value of Y’. The ex- 
amples given at the end of the paper show 
clearly that this is the case. One other point 
should also be noted: the weights as defined 
by equation (6) are actually partial regres- 
sion weights, i.e. b,;, for example, is the re- 
gression coefficient of the i-th measurement 
on the criterion variable with the influence 
of all the other & — 1 measurements par- 
tialled out or held constant. The exact 
weights are, therefore, the total regression 
weights (i.e. the weights when only one 
measurement is considered at a time) ad- 
justed to allow for the influence of the other 
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measurements. It follows that in developing 
the approximate weights we should pay par- 
ticular attention to this adjustment of the 
total regression weights. 

In many examples, some of the measure- 
ments have very little influence (i.e. the re- 
spective weights are small) and the question 
arises as to whether these should be retained 
or discarded. This is an additional question 
and has not been considered in the present 
paper. It is sufficient to note that in these 


, cases the measurements with little influence 


may be discarded without greatly affecting 
the predictive value of Y’. We shall assume 
in what follows, however, that all the meas- 
urements are to be used in the prediction 
equation. 

Once we obtain a set of regression weights, 
either approximate or exact, we must deter- 
mine the accuracy with which these enable 
us to estimate the values of the criterion 
variable. If we denote by ,Y,’ the predicted 
or estimated score of the ¢-th individual 
obtained by using the /-th set of regression 
weights, we may use the quantity 

N 
(9) 


as a measure of the accuracy, or inaccuracy, 
of our estimates. If we denote the correlation 
between the two sets of scores Y,; and ,V;’ 
by R,, and by Syy the sum of squares about 
the mean of Y,, then equation (9) may be 
written in the form ' 

xi? = Syy (1 — R,’) (10) 
It follows, therefore, that the coefficient R, 
is the best measure we can use of the accu- 
racy with which the criterion scores may be 
estimated. We could, of course, compare the 
approximate with the exact regression 
weights, where these are known,’ but this is 
an indirect rather than a direct approach to 
the problem and in our case at least would 
be neither particularly useful nor interesting. 

Clearly there would be little practical 
advantage in using approximate regression 
weights if their calculation involved almost 
as much work as the calculation of the exact 
weights. Most of the labour involved in the 
exact solution enters in the calculation of the 


Meany values of the S,, for which i+j, 


2 Wherry, R. J. An. approximation method for ga 
cri . Psychometrika, 1940, 5, 1 
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and the solution of the & simultaneous equa- 
tions. None of this need be done in calculat- 
ing the values of the approximate weights 
given below. In adjusting the weights for the 
influence of the other measurements we use 
the average value of S, s(¢ j), which may be 
denoted by S,. If we define 


(11) 


and denote by Srr the sum of squares about 
the mean of 7,, then 


k 


‘=I 


If S, is set equal to zero then, of course, the 
approximate regression weights are exactly 
equal to the total regression weights discussed 
earlier, i.e. 


15; = Six/Sii (13) 


where ,5,; denotes the i-th regression weight 
of the first approximate set. As will be shown 
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A’, denotes the determinant formed by re- 
placing the é-th column of A’ by r,y, rey, ..., 
‘xy; the letter r is used as usual to denote the 
product-moment correlation coefficient. All 
the diagonal elements of A’ have the value 
unity, and the other elements the values of 
the respective r;;(ij). 

If we now assume that the r,; are all equal, 
and equal to some common value which we 
may denote by r, then we can easily evaluate 
A’ by subtracting the elements of the first 
row from the corresponding elements of all 
the other rows in turn, and expanding in 
terms of the elements in this row. Similarly, 
A’; may be evaluated by interchanging rows 
and columns so that the value of r,y appears 
in the upper left-hand corner position, sub- 
tracting the elements of the first row from 
the corresponding elements of all the other 
rows in turn, and expanding in terms of the 
elements of this row. Substituting the values 
of A’ and A’, so obtained in equation (14), 
we find: 


(15) 


k 
— 4 3 
| I—f 


later, these are very useful weights and in 
many cases will be sufficiently accurate for all 
practical purposes. In the development of 
the next set of approximate weights it is 
assumed that S, -fo. 
It is possible to develop sets of weights 
based on the assumption that the Subs 
are equal, and equal to the average v 


S,, with or without: an additional assumption - 


regarding the values of the S,,.. Experience 
has shown, however, that these sets of 
approximate regression weights do not always 
give satisfactory results; in some cases very 
peculiar results may occur because one 
obtains what correspond to negative sums of 
squares. Fortunately it is not difficult to 
avoid this difficulty. Equation (6) may be 
written in the form 


b V A’; 

VSi A’ 
where A’ denotes the determinant obtained 


by dividing the rows and columns of 4 by 
the square roots of the diagonal elements, and 


(14) 


where ,b, denotes the i-th regression weight 
of the second approximate set. 

- In estimating the average value of the 7;;, 
i.e. in estimating the value of r, we can use 
the value of S,. Thus we may use as an esti- 
mate of r either 


r =S/Si (16) 


or 


= (tj) (17) 


where S;, denotes the average value of the 
S;;, and YS,,S;; denotes the average value 
of the square roots of the products of S,, and 
S;;. These are only rough approximations to 
the value of r but they seem to be sufficiently 
accurate for our purpose. 

Various other approximate weights have 
been suggested for use. We might use the 
unweighted sum of the scores, i.e. use the unit 
weights 

31 = 


(18) 
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or use weights of unity with signs the same 
as the signs of the S,y’s, i.e. use the weights 


a= (19) 


(which are the same as ,); if all the S,y are 
positive), or use weights proportional to the 
correlations with the criterion, i.e. use the 
weights 

501 = fiy (20) 


There are numerous other weights which 
might be considered, but like those given in 
equations (18) and (19), there is little theo- 
retical justification for their use. We shall 
use only the above five sets of weights in the 
examples given below. 


EXAMPLES 


The first example is one considered by 
Travers,’ the second is one given by Thom- 
son,® and the third is one given by Fisher.* 
In all three cases the exact weights and hence 
the maximum correlation with the criterion 
were available. Also, the values of the 7; 
were given, and hence the actual average 
value of the 7,; could be used as well as the 
estimates of r obtained from equations (16) 
and (17). The correlations with the crite- 
rion of the scores weighted according to the 
above formulae are given in Table I; the 
maximum values are given in the first row. 
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involved in calculating the exact weights is 
not warranted. It is realized, of course, that 
different results may be obtained in other ex- 
amples, but nevertheless it is felt the results 


given are typical of those which may be 


obtained by the use of approximate regres- 
sion weights. In the first example the inter- 
correlations between the measurements are 
low, but in the last two examples they are 
not. In spite of these differences, the approxi- 
mate weights yield very satisfactory results 
in all three cases. 

The third and fourth sets of approximate 
, Weights are not as satisfactory as the others. 
‘The first and second sets of weights work 
about equally well, while the fifth set may 


’ give either equally good, better, or poorer 


results. Apparently no single method will 
generally yield the best results. With regard 
to the second set of weights, it is clear that 
we may use either r’ or r’” as an estimate of 
r. Since r’ is easier to calculate, it would seem 
to be the more useful estimate of r. In any 
practical problem one would presumably use 
the first, second (based on r’), and fifth sets 
of weights and choose for final use those 
which yield the highest value of R. 
Flanagan® has suggested a _ successive 
approximation solution for problems of this 
kind. It involves an adjustment of the 
approximate regression weights to allow for 
the influence of the S,; or r;;, (ij), and is 


TABLE I 


VALUES OF THE MULTIPLE CORRELATION COEFFICIENT, R, OBTAINED BY USING EXACT AND 
APPROXIMATE MULTIPLE REGRESSION WEIGHTS 


Method 


Using r” 


. For these examples it is clear that the re- 
sults obtained by using the approximate 

weights are so satisfactory that the labour 
Travers, R. M. W. Use of Discriminant 


Function 
in the Treatment of Psychological Group Differences.” P: 
chometrika, 4, Pp. 25-32. 
Test Scores 


of a Representative Group o; ott: hildren, 2 7-33. 


Publication XVI, Scottish Council for Research in 
aw of London Press, 1940. 

Fisher, R. A. Statistical Methods for Research Workers 
(nth edition), Pp. 163-166. Oliver & Boyd, Edinburgh, 1938. 


Value of R 
Travers’ Thomson’s Fisher’s 
Example 


therefore based on the same general principle 
as the solution involving S, or r discussed 
above. In his solution, however, one must 
first apply the approximate regression weights 
to obtain the predicted scores (which are 
used in meg reyng the value of R), and then 


5 Flanagan, Sate Approximation Solution 
lems In Number of Vari- 
Research 


| 
| 
ev 
| 
‘ 
Equation No. Weights 
(13) . 785 . 676 . 665 
Using average rij - - - . 787 . 668 . 662 
(18) . 676 403 
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find out how well each of the variables 
agrees with these predicted scores—not the 
original criterion scores. By comparing the 
correlation of each variable with the criterion 
with the correlation of each variable with the 
predicted scores, a basis for revising the 
weights is provided. The weights are revised 
by® (see p. 77) “simply increasing the weight 
of those variables which are not well enough 
represented in the predicted score in compar- 
ison to their representation in the criterion 
score. Not well enough represented means 
that the correlation is not as high. Similarly, 
one reduces the weights or eliminates the 
variables which are too well represented in 
the predicted score; that is, in comparison 
with their representation in the criterion 
score.” 

Unfortunately, there does not appear to be 
any definite method of making these adjust- 
ments which will cover all situations. Flan- 
agan suggests, as a rule of thumb, “make an 
adjustment proportional to the difference of 

*Op. cit. 
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the correlation coefficients, in the opposite 
direction.” Also, if one changes a large num- 
ber of the weights at once, even in accord- 
ance with the above rule, it is possible to get 
a lower value of R. This weakness had been 
noted by the above author (see the discussion 
on page 79 of his article) but no method has 
been suggested for overcoming it. 

Clearly this method does not admit of gen- 
eral application and its usefulness, therefore, 
is considerably limited. Since, in addition, a 
great deal of labour is involved in making 
the adjustments, and the data of Table I in- 
dicate that the resulting improvement is in 
any case likely to be slight, it is doubtful if 
such a successive approximation solution will 
be of much practical value. This question, 
and the question of the practical value of the 
approximate weights suggested in this paper, 
can only be answered by further studies of 
this kind. Since the multiple regression 
method is an exceedingly powerful one and 
would be more widely used if it did not in- 
volve so much arithmetical labour, it is to be 
hoped that these studies will be made. 
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CHARACTERISTICS OF KURTOSIS 


Dovuctas E. ScaTEs 
Duke University, Durham, N. C. 


This paper presents an exploration of kur- 
tosis—of the concept of kurtosis which is 
generally held, and of the extent to which the 
common measures of kurtosis are in agree- 
ment with that concept. Certain surprising, 
and rather disturbing, characteristics are 
‘found in the function that is represented by 
the ordinary moment formula for kurtosis. 


History AND GENERAL NATURE OF KURTOSIS 


The concept of kurtosis and the moment 
formula for measuring it were introduced in 
1905 by Karl Pearson with the following 
statement: 


“Given two frequency distributions which 
have the same variability as measured by the 
standard deviation, they may be relatively 
more or less flat-topped than the normal 
curve. If more flat-topped I term them platy- 
kurtic, if less flat-topped leptokurtic, and if 
equally flat-topped mesokurtic. A frequency 
distribution may be symmetrical, satisfying 
both the first two conditions for normality, 
but it may fail to be mesokurtic, and thus 
the Gaussian curve cannot describe it.” (p. 
173)? The definition given recently by Kurtz 
and Edgerton emphasizes the formula which 
Pearson proposed,” namely kurtosis = 8, — 3. 

“The relative degree of flatness (platykur- 
tosis) or peakedness (leptokurtosis) in the 
region about the mode of a frequency curve, 
as compared to the normal probability curve 
of the same variance, which is mesokurtic. 
When £., the ratio of the fourth moment 
about the arithmetic mean to the square of 
the second moment, is greater than 3, the fre- 
quency curve is leptokurtic; when £, is less 
than 3, the frequency curve is platykurtic; 
and when £, is equal to 3, the frequency 
curve is mesokurtic. Sometimes called ex- 
cess.”® 


1 Karl Pearson, ‘Das Fehlergesetz und seine Verallgemeiner- 
ungen durch Fechner und Pearson. [Skew Variation:] A Re- 
joinder.” Biometrika 4: 169-212; June 1905. This quotation 
is given by Helen Siudies in the of Sta- 
tistical Method, p. . Baltimore, Md.: Williams and Wil- 
kins Co., 1929. 


2 Ibid., p. 181. 


® Albert K. Kurtz and Harold A. Rigen, Statistical Dic- 
Pg oo and Combes p. 88. New York: John Wiley 
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This definition should be amended by sub- 
stituting “moment coefficient” for “moment” 
in the two places it occurs. The moment co- 
efficient is the moment divided by N (or by 
degrees of freedom) and since the second 
moment coefficient is squared, this has the 
effect ultimately of introducing an N in the 
numerator of the ratio. We have, accordingly 
= (po)? Bs is some- 
times designated as a,. 


A number of writers employ the term ex- 
cess as synonymous with kurtosis, Richard- 
son* pointing out that this usage follows the 
Scandinavian school. Schmidt® offers the 
terms overconcentration and underconcentra- 
tion. Kurtz and Edgerton® give the terms 
homokurtic and heterokurtic as applied re- 
spectively to two-way frequency tables which 
have arrays of equal kurtosis, or of different 
degrees of kurtosis. Two other terms involv- 
ing “kurtic” are not related to the flat-topped 
character. Isokurtic was used by Pearson to 
indicate symmetry and allokurtic to indicate 
skewness.” 


Cone® canvassed statistical literature to 
discover treatments of kurtosis. An inspection 
of all volumes of the Journal of the American 
Statistical Association, Annals of Mathemat- 
ical Statistics, and of suggestive titles in the 
Journal of Educational Psychology, Biomet- 
rika, and the Journal of the Royal Statistical 
Society revealed only a small number of 
articles dealing with kurtosis. Seventy-seven 
statistics text books were examined; about 
half treated kurtosis. Of the books published 
before 1930 three out of every ten mentioned 


*C. H. Richardson, An Introduction to Statistical Analysis, 
os Enlarged edition. New York: Harcourt, Brace & Co., 


5 Robert Schmidt, “Statistical Anal: of One-Dimensional 
re Annals of Mathematical Statistics §: 54; March 


* Op. cit., p. 77 and 75. 


™Helen M. Walker, Studies in the History of Statistical 
Method, p. 182. Baltimore, Md.: Williams Wilkins Co. 
1929. Kurtz and Edgerton (op. cit.) restrict these terms to 
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the topic; of those published after 1929, 
seven out of every ten treated the subject. 

Throughout the literature two formulas 
are basic—the one given by Pearson and the 
rank formula given by Kelley. The latter was 
introduced in 1921,° and is based on two rank 
measures of dispersion, namely, the quartile 
deviation and the 10-90 percentile range. 
Kelley explained: “For a mesokurtic distribu- 
tion, Q = 0.26315 D. Therefore, if Q/D is 
less than 0.26315, the distribution is lepto- 
kurtic, and if Q/D is greater than 0.26315, 
the distribution is platykurtic.” The values 
given by the formula therefore run in the 
opposite direction to those given by the 
moment formula of Pearson. It would seem 
appropriate to subtract them from 1 or to 
invert the ratio. 


More than half of the textbook writers 
cited by Cone as treating kurtosis gave B, 
(the ratio of the fourth moment coefficient to 
the square of the second moment coefficient, 
both taken about the mean); in this case, 3 
would be the criterion value for mesokurtosis. 
Other writers subtracted the three, making o 
the criterion value. A number of the latter 
writers divided the difference by 2 or by 8. 
Schmidt” measured kurtosis by the ratio of 
the Tchebychef coefficient of kurtosis to 
sigma. Snedecor" followed Fisher in employ- 
ing cumulants (instead of moment coeffi- 
cients) which in effect give 8, — 3 for the 
parent population. The usual proportion of 
errors in formulas was found by Miss Cone. 

Standard error formulas for the moment 
and rank coefficients were presented by Pear- 


son and Kelley, respectively, as \/24/N, and 
.2778/\/N. While the latter value is about 
one seventeenth the first, the value of the 
rank formula for a normal distribution is 
about one eleventh that of 8,. When the dis- 
tribution is not normal, the standard errors 
become more complex.”? 


*Truman L. Kelley, “A New Measure of please: * 
Journal of the American Statistical Association 17: 743-49; 
{me 1921. Also given in his oe Statistical Method. p. 77. 

ew York: Macmillan Co., 192 

Robert Schmidt, Analysis of One-Dimensional 
Distributions.” Annals of Mathematical Statistics $: 51; 
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Fm 76-79. (Sixth edition). Edinburgh: Oliver and Boyd, 


wations in: H. L. Rietz, 
editor, Handbook of Mathematical Statistics, p. 96. Boston: 


A imate le error values given in Table 
II of: 1 P ,_ editor, Tables for Statisticians 
and one ms, Part I. Cambridge, England: University 
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In more recent writings the skewness of 
the sampling distribution has been recognized, 
and the probable error has given way to 
probability levels, or fiducial limits. Geary 
and Pearson’* presented tables and diagrams 
giving approximately the 1% and 5% proba- 
bility levels for 8?. They warn however that 
“Owing to the extreme skewness of its sam- 
pling distribution the accuracy of the tabled 
probability levels for 5, for, say n’ less than 
200 cannot be finally assessed without fur- 
ther investigation.” (p. 2) Because of this 
difficulty with the sampling distribution 
Geary proposed another measure of kurtosis, 
especially for the purpose of testing whether 
a sample was likely (or unlikely) to have 
been drawn from a normal population. This 
alternative function is the ratio of the mean 
deviation to the standard deviation. For the 
normal distribution this becomes \/2/z, 
which is the familiar value of 0.7979, or twice 
the height of the midordinate in the unit 
normal curve. Tables and diagrams for the 
fiducial limits of this ratio are given, and 
“its use in place of b, is recommended, at any 
rate for n’ less than 200.” (p. 2). This ratio, 
like the one based on rank values proposed 
by Kelley, gives values in the opposite direc- 
tion to those of 8°. That is, when the ratio 
is small (below 0.8) the distribution is 
leptokurtic. 


The characteristics of this measure of kur- 
tosis have not been studied in the present 
undertaking; it may be that it is free from 
some of the anomalies of the older moment 
measure. 


*R. C. Geary and E. S. Pearson, Tests of Normality. 
London: Biometrika Office, University College, 1938. 15 p., 
on. This paper rests on a rather extended literature. Refer- 

are given to earlier work by these two authors (1930, 
1931, 1935, 1936), published in Biometrika; and to studies by 
A. Fisher (1929), Karl Pearson (1902), and P. eae 
work in sampling distributions by C. C. 
. Fisher, and John Wishart, is cited in the earlier 


Table V, “Probability Points of b,” is the same, except for 
some Cc in the argument, as Table XXXVII>'* contained 


in: Karl Somme, editor. Tables for Statisticians and Biometri- 
cians, Part II, p. 224. London: Biometric Laboratory, Univer- 
sity College, 1931. This table was published originally in: 
S. Pearson, “A Further Development of Tests for Nor- 
aie , Biometrika 22: 239-49, and 423-24; July 1930 and 
i931. He stated later (1935): “The adequacy of the 
for the 5% and 1% limits “for f, is at present more 
doubtful.” (Biometrika 27: 333.) 
recen’ lead dev nts in the 
ing distribution Lee B,: C. T. Hsu and D. N. Lawley, 


of the Fifth Sixth Moments of 
Distribution of b, in samples from a normal dis 


Biometrika 31: 238-48; March 1940. 
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Cone’s Exploratory STUDY 

Bonnie Cone™ undertook an investigation 
of certain properties of kurtosis based upon 
the following reasoning: if leptokurtosis 
means peakedness, how wide can the peak 
become, with reference to the base, before it 
changes over into platykurtosis, or flat- 
toppedness? Obviously as the shoulders of the 
peak become broader, with reference to the 
base, the curve approaches a rectangle— 
which is the limit of flat-toppedness (before 
passing over to the U-shaped distributions). 
Kurtosis is described as flatness or peaked- 
ness “in the region about the mode.” What 
are the limits of this region? Kurtosis is re- 
garded as excess, or defect, of frequency; but 
within what bounds? If leptokurtosis is over- 
concentration, within what range must the 
piling up occur, lest it be underconcentration 
instead? Can a pair of points be located on 
the base of the normal curve within which 
an increase in frequency will lead to a value 
of 8, greater than 3, and beyond which an 
increase in frequency will produce a kurtosis 
value less than 3? 

Employing a normal curve extending to 
3-750 on each side of the mean, Cone made 
seven exploratory attacks on this question, in 
each attack varying the number of intervals 
included in the central portion, or varying 
the height to which they were increased or 
decreased, or both. The effects were noted 
on values of 8,, Q/D, and its reciprocal, D/Q. 

It became clear before the work was fin- 
ished that no single answer to the question 
would be obtained. As often occurs in re- 
search, the question asked was too simple. 
The proper degree of complexity of a concept 
that is required to parallel facts frequently 
emerges only after a number of exploratory 
investigations. In the present case, the em- 
phasis was upon shape, such as creating a 
flat-topped center which was progressively 
widened, adding a constant to each of the in- 
tervals in a gradually extending central range, 
or multiplying the frequencies in the neigh- 
borhood of the mode by a constant—thus 
preserving, in the latter two cases, either the 
absolute or relative shape of the normal 
curve across the experimental area. Varia- 
tions in the amount by which the frequencies 
were increased were tried; and while these 
often went down to the base line, and up to 


%* Bonnie Ethel Cone, An Analysis of Measures 
of Kurtosis. Master’s thesis. tham, N. C.: Duke Univer- 
sity, 1940. 140 p. 
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twice the height of the midordinate, they 
were not sufficiently extensive to be final. 
For in kurtosis the size of the weight that is 
added is of as much importance as where it 
is added. This fact was not recognized until 
late in the work. 

The rationale is this: when there is no 
limit placed on the height to which an inter- 
val may be raised, any interval, or pair or 
group of intervals, may be raised to such 
heights that they dominate the entire distri- 
bution and the rest of the distribution be- 
comes negligible. For example, theoretically, 
any selected group of central intervals (ex- 
tending as far out as one might choose) 
might be raised to such an extreme height 
that the remaining (outer) intervals and 
their frequencies became negligible. When a 
frequency distribution is increased by adding 
a large constant to each frequency, so that 
the absolute shape of the original curve is 
preserved but the population has been greatly 
increased, the curvature tends to be ob- 
scured, and the value for 8, decreases— 
approaching, presumably, the value for a rec- 
tangle, namely, 1.8. In ordinary practice, 
however, the value may decrease below 1.8, 
even when Sheppard’s corrections are applied. 
An obtained value below 1.8 does not there- 
fore necessarily indicate a U-shaped dis- 
tribution’* 

Kurtosis, then, as measured, reflects some 
element of the ratio of the curvature to its 
average height above the base line, and add- 
ing a constant to the frequency in each in- 
terval lowers the value of 8,. The same thing 
will normally be true for the Q/D formula, 
except that its value will increase, approach- 
ing 5/16, or 0.3125 which is the value for a 

distribution. We are likely to 
gain the impression that, since both the 
moment and rank formulas are divided by a 
standard unit of spread, they are absolute 
measures; but they are not at all absolute in 
the vertical direction, since both may be 
“diluted” by an increased population which 
might preserve the absolute shape over a 
given base line. If kurtosis (as measured) is 
not to change with population, all the ordi- 
nates must be multiplied by the proportion 
of change. 

44 The calculated value for a rectangle with 9 intervals is 
1.77; for 15 intervals it is 1.79. The value 1.8 is a limit ap- 

ached as the number of in becomes 
Davis in AA p. 318- 
19. Second Indiana: ‘Principia 


ed., rev. and enl. Bloomington, 
Press, 1937. 
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Returning to Cone’s analysis: In addition 
to throwing the nature of the problem into 
clearer relief her work yielded a number of 
significant practical conclusions which we 
shall now note. 

Criterion values.—A value of 3 will not be 
obtained for a histogram approximation to 
the normal curve without using rather small 
intervals and including a fairly wide base. 
With 15 intervals of o/2 each, extending 
over 7.50, the value is approximately 2.97. 
Increasing the range to 11.50 gives the theo- 
retical value correct to four significant figures 
when uncorrected, and to five significant fig- 
ures when corrected. This is with fairly large 
intervals—o/2. For Q/D and fifteen inter- 
vals the value is 0.265. These calculations 
were made on a histogram of 15 intervals in 
which the midpoints of the intervals were 
made the height of normal curve ordinates 
for corresponding points. Values of o came 
out very close to 1; values of wu, varied more 
from the expected value of 3. The conclusion 
is that the theoretical values can be fairly 
closely attained by practical calculations 
with a histogram approximation to the nor- 
mal curve, under conditions no more favor- 
able than those indicated, namely, a fairly 
large interval and a limited baseline. 

Number of intervals—How sensitive is 
the calculation of kurtosis to the number of 
intervals employed? This question has 
already been answered in part. Cone found 
that it is possible to go down to 5 intervals 
(covering a range of 7.50) with a kurtosis 
value in error by 0.1, but that a smaller num- 
ber of intervals gave erratic results. The 
rank formula became jumpy with fewer than 
10 intervals. 


Geary and Pearson*® performed similar 
calculations with intervals of three sizes— 
a/t0, o/5, and o/3. They found that the 
change in interval size made little difference. 
Note however that the intervals experimented 
with in Cone’s study were all larger than 
these, ranging from o/2 to 1.50. 

Sheppard’s corrections—In general these 
corrections*® were of slight service. They 

*R. C. Geary and E. S. Pearson, Tests of Normality, p. 6. 
London, England: Biometrika Office, University College, 1938. 

%* Karl J. Holzinger, Statistical Methods for Students in 
Education, p. 341. Boston: Ginn and Co., 1928. 

G. Udny Yule and M. G. Kendall, An Introduction to the 
Theory of Statistics, p. 160. Eleventh edition. London: Charles 

Given origially the following: W. F. ‘Sheppard, “On 
the Calcula the A\ Square, Cube, etc., of a’ Large 
Number of | Magnitudes,” Journal Royal Statistical 
Society 6: 698-703; September 1897 


CHARACTERISTICS OF KURTOSIS 229 


tended to improve the values of o and », in 
the second or third figure, but had little effect 
on the ratio, 8,, and were often in the wrong 
direction. That is, for fifteen or fewer inter- 
vals the results were more nearly in agree- 
ment with theory when uncorrected, in about 
half of the cases. In the case of a 15 interval 
distribution, of o/2 interval, fitted to the 
normal curve through making the area correct 
in each interval, the corrections improved the 
values of o and », but made the ratio, £,, 
poorer. In a similar distribution, but fitted to 
the normal curve through making the height 
of the histogram equal to the ordinate of the 
normal curve at the midpoint of each inter- 
val, the corrections made the values of o, »,, 
and 8, all poorer. Perhaps the difficulty is 
that the corrections were worked out for a 
smaller interval. It is significant, however, 
that when one most feels the need of such 
corrections (when his grouping is coarse) 
they are not dependably beneficial. For a 
number of intervals down to 5 they gave 
wavering results; for three intervals the cor- 
rections were absurd. 


Geary and Pearson noted that “numerical 
investigations in progress suggest that unless 
the group interval is more than one-third the 
standard deviation it is really not necessary 
to apply any corrections to the moments.” 
Their statement is conservative; but one won- 
ders whether there is any use to apply the 
corrections when his intervals are larger. 


Central portion—The central range of a 
frequency distribution within which an in- 
crease in frequency will increase kurtosis is a 
function of the amount of increase, the shape 
of the original distribution, the formula em- 
ployed to measure kurtosis, and the width of 
the interval within which the increase occurs. 
For small increases in area (the permissible 
amount varying with the location), Cone 
found that 8, would increase between the 
limits of +.740 and —.740; this range is 
then the central portion which was sought. 
It is not, however, by itself a mathematically 
sufficient condition; neither is it an exclusive 
condition. 

False kurtosis—Cone’s explorations 
vealed that outside of this “central portion” 
(which varies with several factors) the addi- 
tion of moderate weights will decrease kur- 
tosis—which is in line with the common con- 
cept. That is, broadening the shoulders of 
the curve makes it more platykurtic. The 
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mathematical function quickly departs, how- 
ever, from this simple concept, for beyond 
2.330 the addition of a moderate weight will 
increase kurtosis, the addition of a larger 
weight will return the value of 8, to 3, and 
a still larger weight will yield a 8, value in- 
dicating platykurtosis. 

These findings are most disturbing to one 
who has accepted the common notion that f, 
is a measure of excess, overconcentration, 
‘ peakedness, etc., for the very opposite of 
these conditions may give the same values for 
B82 as the leptokurtic conditions will. One 
cannot tell, then, from his calculated value of 
8, whether his curve is really leptokurtic or 
just the opposite. He will have to rely upon 
his visual inspection of the curve more than 
(or in addition to) his calculations; and in 
many cases where departures from the nor- 
mal are distributed irregularly along his ob- 
served distribution one will be incapable of 
judging what the calculated value really 
means. 


The conditions referred to are objection- 
able not only in giving a false indication of 
leptokurtosis; they may equally readily give 
’ a false indication of platykurtosis, or of nor- 
mality. For example, there is an infinity of 
conditions under which a value of 3 can be 
obtained for 8, for a curve which is not 
normal. Perhaps it is for such reasons that 
the sampling error of 8, is difficult of assess- 
ment. 


It appears that this property has, at least 
in part, been understood previously’ though 
a search of the literature did not reveal any 
explicit statement. The recognition given to 
the tails by Fisher is more suggestive than 
accurate, for it represents sufficient but not 
necessary, and not properly restricted, condi- 
tions. That is, in the first place, the tails need 
not be involved (need not depart from their 
normal curve shape) for a value of £, greater 
than 3; or, contrariwise, the central portion 
need not change its shape or height. In fact 
neither the central portion nor the tails need 
be increased to produce leptokurtosis, as 
measured—a decrease of 1% or more of the 
area at +10 will produce a value of ~: 
greater than 3. 


™ For example, R. A. Fisher in his Statistical Methods for 
Research Workers (sixth edition) notes: “A measure of de- 
parture from normality . by which the apex and the two 
tails of the curve are increased at the expense of the inter- 
mediate portion, or when , the top and tails are de- 
and the shoulders fi out.” 54). Edinburgh: 
Oliver and Bova, 1936. 
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The statements just made assume a normal 
distribution which is distorted at various 
points; it is recognized that if one start with 
a distribution having such distortions and fit 
a normal curve to it, by methods of curve 
fitting, the original distribution may show up 
relatively in the manner indicated by Fisher 
—though there are many conditions which 
will render even this generalization question- 
able. For example, a very large weight can 
be added near the mean without an increase 
in B:. It is difficult to make a simple state- 
ment about kurtosis that will be revealing 
and accurate. It represents the interplay of 
several factors. 


ALGEBRAIC EXPLORATION OF KuURTOSIS 


The nature of kurtosis, as measured by £,, 
can be indicated rather clearly by certain 
algebraic representations. We shall proceed 
to add a weight, .w, to the normal curve at a 
point z = x/o (that is, at x distance from the 
mean measured in units of a). In order to 
preserve symmetry, the weight w will be 
added at the same distance above and below 
the mean in all cases, so that the area in the 
curve is increased (or decreased if w is nega- 
tive) by 2w. 


With added weights, w, at points +z, this 
becomes 
+ 2wz2*) (N+ 2w) 


(Sx? + 2wz?)? (1) 


This statement is general, holding for a curve 
of any shape, insofar as kurtosis may be re- 
garded as applying to the given shape. The 
weight w is in units of N; that is, when 
w= 0.1, 10% of the area of the entire dis- 
tribution ji is added (or subtracted) at each of 
two points. 

We may now take advantage of the fact 
that, for a unit normal curve (having o = 1, 
and N = 1), 3x* = 1 and 3x* = 3. We may 
then write: 

(For a normal curve distorted by 2w). 
(1 + 2wsz?)? 


If w — 0, this expression reduces to 3, which 
is correct for the normal distribution. 

If we desire to see what combinations of 
w and z will produce a f. value the same as 


(2) 
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that for a normal distribution we may set the 
above equation equal to 3, and after re- 
arranging obtain 
(For B2= 3) 

—2w [(4w—1)z*+ 627—3]—0 (3) 
Solving this for w gives w = 0, which is 
trivial, and 


(For w—3—% 


+.25 (4) 


We may substitute various values of z in 
this expression and obtain values of w 
(apart from w = 0) which will leave 8. = 3. 
Plotting such values gives Figure 1. It will 
be seen that as z approaches o the curve gets 
indefinitely high. For z — 0.5, w = 6.25. 
That is, at +o/2 we may add weights equal 
to six times the entire area of the normal 
curve and still obtain a value of 8, that in- 
dicates normal kurtosis. For z = 0.1, w = 
7;350-25, and for z = 0.001, w is approxi- 
mately 750 billion. As stated earlier, one 
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would not mistake such monstrous distortions 
for normal distributions, yet there.is nothing 
in the calculated value to show that they are 
not normal; and there are many lesser depar- 
tures from the normal which would not be 
visually apparent and which would not be 
indicated by 

We must not assume from the above that 
an infinite weight can be added at the exact 
mean without affecting 8., for there is an 
infinite discontinuity at this point. So long as 
z has any value at all, no matter how small, 
w is very large, but when z becomes o, w be- 
comes o if the condition of 8B, — 3 is to be 
maintained. This fact may be seen by letting 
2 = 0 in (3), or setting (2) equal to 3 with 
z =o. A distribution which is otherwise nor- 
mal cannot be raised or lowered at the mean 
(alone) without affecting 8.; although prac- 
tically the weight would probably not be ap- 
plied solely at the exact mean, but over an 
interval having more or less width, so that 
some freedom in the frequency of the central 
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Figure 1. Amount of area which can be added to the normal curve at various points with- 


out affecting the value of As. 
It is assumed that an 


amount will be added to both sides of the normal distribution, 


at the same distance from the mean. The diagram represents only the right half of the base- 
line of the distribution; a similar diagram with negative ¢ scale would represent the left half. 


The added (or subtracted) 
tribution. At w= 0, the curve is unaltered 
are added at two 
2w + 1.) 


area, w, is in units of N, the area of the original normal dis- 
; at w=—.5, areas 
symmetrical points, thus doubling the original area. (The 
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interval of a distribution — be permitted 
without disturbing = 

Again it should be hod <P that the curve 
in Figure 1 is one of exact values; it does not 
mean “all of the area up to the curve.” 
Weights lying close to the curve would pro- 
duce values approximately equal to 3; but 
in general, weights lying off the line any dis- 
tance will produce values markedly above or 
below 3, in accordance with the regions 
roughly indicated in Figure 1. One will note 
that the curve runs between regions of platy- 
kurtosis and leptokurtosis. But in Figure 1 
the axis at w = o is also a curve satisfying 
the condition, 8. = 3, and it also divides 
regions of opposite character. 

The points of intersection of the w — o 
axis and the curve, in Figure 1, are of some 
interest. The writer had noted that the points 
(“nodes”) which Miss Cone located as 
bounding the “central portion” of the normal 
curve were approximately at the points where 
the fourth derivative of the normal curve™* 
also changes from positive to negative, or 
vice-versa. That is, for the range in which 
the fourth derivative is negative, an addition 
to the normal curve would necessarily de- 
crease 8. from 3. Outside of this range (and 
excepting the point at the mean) the effect 
of an increase in area would depend upon the 
size of the increase. 

This observation was referred to Professor 
Joseph A. Greenwood” for explanation; he 
pointed out that when w — 0, (3) reduces to 
the fourth Hermite polynomial’*- set equal to 
o. The fourth derivative of the normal curve 
vanishes when the fourth Hermite polynomial 
vanishes. The roots of 2* — 62? + 3 =o are 


sen 


and also another pair having the same value 
but a different sign (hence applying to the 
negative side of the curve). These values are 
approximately 0.74196379 and 2.3344142. 
They are the points at which the curve in 

“H. L. Rietz, editor. Handbook of Mathematical Statistics, 
p. 209-16. Boston: Houghton Mifflin Co., 1924. 

This table is given in more extended form in: James W. 


— editor. Tables of Applied Mathematics in Finance, In- 
wan oe » Pp. 394-413. Ann Arbor, Mich.: George 


%” Formerly of Duke University it of Mathematics. 
aged lieutenant in the United States avy Reserve, The writer 
is indebted to Dr. Greenwood for and aid. 
specific sha of distribution other than the normal were 


undertaken Dr. Greenwood; these will probably be 
lished elsewhere. vend 
1% James V. Uspensky, Introduction to Mathematical Prob- 


ability, p. 72. “New York: McGraw-Hill Book Co., 1937. 
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Figure 1 crosses the axis w = o. There are 
accordingly three points at which the normal 
curve cannot be varied (assuming that the 
variation occurs precisely at a point, and 
that there is no variation at any other point 
to compensate) if 8, is to remain 3. These 
points are at z = 0, 0.742, and 2.334. 

These “nodes” do not have the property 
which was anticipated, namely, of permitting 
arbitrary variation in w without effect on B.; 
they do not permit any variation whatever. 
No point on the base line has been discovered 
which would permit arbitrary variation in the 
area without effect on §,; though clearly 
various combinations could be made by vary- 
ing w at two or several points instead of only 
one point in each half of the curve, and these 
different patterns of combination would per- 
mit a great degree of latitude. For example, 
as will be seen from Figure 1, large incre- 
ments near the mean (but falling neverthe- 
less to the left of the curve) tend towards 
leptokurtosis, and these could be offset by 
moderately large increments at any point to 
the right of, and above, the curve. Negative 
weights may also be brought into the combi- 
nations which carry the distribution away 
from normality but not away from a value of 
Bz = 3. 

From (4) we may obtain the fact that w 
has a minimum value when z = 1, this mini- 
mum being —.5. Also from (4) we find that 
the asymptotic ceiling for w, as z approaches 
infinity, is 0.25. We may take the roots of 
(3) which evaluate z, getting: 


—3+ V6+ 12w ( 


(For = 3) 


5) 


With this equation we can. determine the 
point (in each half of the normal curve) at 
which a weight of given size would have to 
be placed to leave 8, at the value for a nor- 
mal distribution. If we let w — 0, (5) gives 
the nodal values for z, which have already 
been noted. 

If w is between o and 0.25 there will be 
two values of z in (5). If w = 0.25, (5) has 
o in the denominator, but (3) yields z — 


V.5 = .7071. For w greater than 0.25 there 
will be only one point on each side of the 
mean given by (5), and this will be obtained 
by using the plus sign in the numerator. If 
w == —0.5, z = 1; while this value appears 
in Figure 1, and is ‘the minimum, it must be 
looked upon with question, for in this case 
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the total area of the curve has been reduced 
to a mathematical zero. It may mean some- 
thing as a point on a curve, but probably not 
when taken by itself. When w — —o.5 is 
substituted in several of the other formulas 
it gives undependable results. For greater 
negative values than —o.5, (5) gives values 
of z which are imaginary. 

Figure 1 represents only one of a family of 
curves which could be drawn for various 
values of Be. 

In order to reveal the relationship of the 
weights portrayed in Figure 1 to the actual 
frequencies in a normal] distribution, Figure 
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gram. The intervals centered at -+0.5¢ 
would be raised 31 times the height of the 
middle interval; the next three pairs of in- 
tervals would be reduced to below the base- 
line; and farther out, intervals could be 
raised considerable amounts, to produce 8, = 
3. Values below the baseline are theoretical, 
though in the neighborhood of 2 to 2.330 the 
required reduction could easily occur in an 
actual distribution through being spread over 
a slightly wider interval. If narrower inter- 
vals had been used the amount of change re- 
quired would have been greater, in propor- 
tion. 


yo a 2 is presented. This diagram represents the Values of B, other than 3. Up to this point 
these curve of Figure 1 superposed on a normal we have been considering variations in the 
| per- histogram, so that the top of the histogram normal distribution which would yield a 
mple now takes the place of the baseline (w= 0) value of 3 for 8,. We shall now consider cer- 
eens. in Figure 1, and departures are now meas- tain other values. One wonders, for example, 
sthe- ured from this variable base instead of from when looking at Figure 1, what the vertical 
wands a single axis. The variations indicated in distribution of 8, values would be at each 
t by Figure 2 are those for the midpoint of each value of z, such as would be shown by other 
ard interval, though the entire width of the in- curves representing other constant values of 
ative terval is considered in determining the ,. The calculations necessary to show such 
ey amount of change in the height of the histo- curves have not been made. Instead of 
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curves representing constant values of B,, we 
may observe the effects of constant weights 
at different points, and two curves for this 
purpose are presented in Figure 3. One curve 
represents an addition of 2.5%, and the 
other curve a depletion of 2.5%. These 
changes in area are relatively small, and such 
as might occur in practice at practically any 
point on the baseline out to some 2.5¢. 


As stated earlier, for weights less than 
25%, the curve of 8. values will equal 3 at 
two points; both of the curves in Figuré 3 
therefore cross the 8, — 3 line in two places. 
These points may be obtained from (5), or 
from Figure 1 (if drawn to a larger scale) by 
reading from a given w across to the point 
(or points) at which this value is intersected 
by the curve, and then down to the z value 
on the baseline. Relatively small weights 
were used in the present cases so that the two 
curves cross the criterion line at approxi- 
mately the same point the first time. One in- 
terpretation of this graph is that, for any 
given weight, a point (or two points) on the 
baseline can be found, in each half of the 
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normal curve, at which this weight can be 
added without affecting f.. 


The two curves are not symmetrical with 
respect to the criterion value 3. In general, 
they are negatively correlated, as we should 
expect. We may be primarily interested in 
knowing the relative stability of 8, at differ- 
ent points on the baseline, under the influ- 
ence of different weights. That is, is there 
some point at which various weights (changes 
in area) have little effect on the value of 8,? 
And at what point is the maximum effect to 
be expected? It may be possible that a 
measure of dispersion of 8, values could be 
minimized with respect to values of 2, but 
this does not seem likely, for each value of 
w has its own curve, and one would some- 
what arbitrarily have to decide upon what 
values of w to include. We might however 
point out, from an inspection of Figure 1, 
that moderate weights yield 8, in the neigh- 
borhood of 3 when z has values from about 
0.70 to 0.85. On the other hand, we can pick 
a point of marked dispersion, for all of the 
curves of added area have a minimum, and 


Values of betac 


Figure 3. Values of §; produced by a given weight added to (or subtracted from) a nor- 
mal distribution at symmetrical points. Curve a represents +2.5%; curve b represents —2.5%, 
Percent is based on entire area of the distribution; only the right half of the baseline is shown. 
Curves cross the value 3 at germs sangre! | 0.74¢, though not both at the same point, and 
again beyond 2¢. Curve b drops to negative infinity and immediately starts upward again. 
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the curves of reduced area have a maximum, 
at z — \/3. Whether, however, such a dis- 
persion would be greater than the spread 
after the second crossing of the normal axis 
is questionable. We can, at least, say that 
for small weights (positive or negative) the 
point of considerable ineffectiveness is at 
about 0.750, and that at \/3o the dispersion 
(and the effect) will be greater than at or 
near the mean. 

For values of 82 other than 3, (3) and (4) 
no longer apply. We therefore start again 
with (2) and derive the following equations 
which are not restricted as to the value of 
B., but which assume, as before, that we start 
with a normal distribution and add or sub- 
tract weights at symmetrical points. 


> maximum for +-w, local minimum for 
—w 
When z = 0, 6, = 3(1 + 2w) (6) 
(Minimum for +w, maximum for —w) 
When V3, = 3(1 + 2w)/ 
(1 + 6w) (7) 
(Asymptote at infinity) 
When z ,8.—> (1+ 2w)/2w (8) 


(Equals normal value, 3) 
Substitute the given w in (5). 


(Inflection points) 


1+ 10w~w+ + 12w+1 (9) 
4w 


The value for 8, at the mean is a calculus 
maximum, and may be exceeded at some 
other point (as in the tails). It means that a 
weight added at the mean will make £, larger 
than the same weight would if added at any 
other nearby point. We note from (8) that 
by making w very small we can increase B: 
without limit, if the weight is placed very far 
out. We thus find that a weight has more of 
a tendency to make a distribution leptokurtic 
(as measured by 8.) the farther out in the 
tails it is placed! 

(We may observe that a small weight, such 
as 0.001, or one-tenth of one percent of the 
area, does not give a high value for 8. as far 
out as 50, the value being only 3.86. The 
asymptote for this weight is, however, a B. 
value of sor. The explanation is that the 
curve turns up relatively late; its first inflec- 
tion point is at 1.000, and its second inflec- 
tion point is at 22.450. At 300 it has attained 
a B, value of 207.) 
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Unreal values of B2. It is easy to continue 
the various curves shown beyond the points 
which actual data could yield. The limit of 
practical data is difficult to specify, for it 
depends on how unusual a case one wishes to 
assume. The value of 8, for a rectangular 
distribution is 1.8, and 8, drops to 1.0 as a 
u-shaped distribution gradually disintegrates 
into two single points of equal weight. Start- 
ing with a normal distribution, we could 
approximate this with two extremely large 
weights—placed anyplace, if the weights 
were large enough so that the original dis- 
tribution became negligible. 


We may be certain of impossible values, 
for practice, by taking 8. = o. If we set (2) 
equal to o (instead of 3, as before) and solve, 
we get w = —o.5 and w = —3/2z*. The 
first value has been noted before: when half 
the area of the original curve is subtracted at 
each of two points, the curve ceases to exist 
—mathematically, at least. If we substitute 
the second value in (2) we find that it 
makes the first parenthesis equal to o. In 
other words, this weight is such a value that 
it makes the fourth moment zero. We have 
thus obtained values which will, in turn, 
make each of the parentheses in the numer- 
ator of (2) equal to o—a result which we 
could have picked out by inspection. We 
have, nevertheless, conditions which could 
exist only if our distribution lost all of its 
frequencies, or lost all of its dispersion and 
were concentrated at a single point. 


We may note further that we cannot sub- 
stitute a weight of —1/6 in (7), nor even in 
(2) with a z value of \/3, for at this point 
this weight makes o have a value of zero. 
Hence this denominator becomes another 
case of a mathematical zero. To get the min- 
imum for w = —1/6 one would have to plot 
a curve based on values of z on both sides 
of V3. 


If we are willing to indulge further in im- 
possibilities for the sake of getting better 
acquainted with the purely mathematical 
properties of 8.2, we can consider negative 
values. We note from Figure 3 that one curve 
is going downward very rapidly, and appar- 
ently nothing will stop it. This is so even 
though the weight subtracted is moderate. 
If we examine the numerator of (2), it is 
apparent from the first parenthesis that for 
any negative w a value of z will sooner or 
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later be reached which will make 2wz* exceed 
3. From this point on 8, will be negative. 

We may note further—and again in the 
realm of the impossible—that the other 
parenthesis in the numerator of (2) may also 
be negative, if w has a negative value greater 
than —o.5. Such a condition postulates a 
mirror curve—one wholly (or on the aver- 
age) below the baseline of zero frequency, so 
that N will be at least slightly negative. 
With two signs to vary, we have several pos- 
sibilities of positive and negative values for 
8. produced by combinations of our impos- 
sible conditions; not every positive £,, 
therefore, is to be regarded as real. 

Returning to curve b in Figure 3: since w 
is less than 0.5, the second parenthesis in (2) 
is positive, and the first is evidently negative. 
Further, the numerator is increasing (nega- 
tively) and the denominator continues to 
decrease. At a certain point, however, the 
unsquared value of the denominator also 
turns negative, and at this point the value of 
—2wz2* is no longer reducing the value of 1 
but starts building a negative excess over 
the 1. Since the denominator is squared, this 
value becomes positive, so that 8, continues 
negative but decreases in negative magnitude. 
There is therefore a cusp at negative infinity, 
the curve gradually climbing back to its 
asymptote at —19. 

We may locate these critical points by an 
inspection of (2). They will be the points at 
which the parentheses were noted to become 
o. Now, however, we are interested in ex- 
‘pressing them in terms of w, to yield z; we 


may accordingly write: z > /3/(—2w) 
for making 3 + 2wz‘ negative, and z > 
V1/(—2w) for making 1 + 2wz? negative. 
If the area is also to be negative, —w > 
—o.5. For the curve in Figure 3, the numer- 
ator becomes negative at 2.730; i.e., the curve 
crosses the 0 axis at this point. The denom- 
inator turns at 4.470; hence at this point the 
curve will start upwards again, but never to 
cross the zero line. A curve representing a dis- 
torted normal distribution which had a net 
negative area would, in general, behave in the 
opposite fashion. 

One more glance at our imaginary world 
holds some interest. At z — 2, we find that 
a value of w = —.078125 will produce a B, 
value of 3 (normal); if w — —.086625, B. 
becomes 2; by dropping the weight down to 
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—.090909, we get a value of 1; and by mak- 
ing w = —.0938, 8, is o. We have a range 
of beta values from normal down to zero 
caused by a change of only 1.57 percent of 
the area. If this occurred on the positive side 
we should be somewhat alarmed at the undue 
sensitiveness of 8, at this point. But there 
are enough ‘amazing characteristics of £, 
within the realm of real distributions to 
attract and hold our fancy. 


CONCLUSIONS 


The term kurtosis was introduced by Karl 
Pearson in 1905. He used 8, = y»,/p,” to 
measure it, the value for the normal curve 
being 3. A rank formula, Q/D, was intro- 
duced by Kelley in 1921. Values of this 
formula run in the opposite direction to those 
of the moment formula. The criterion value 
for normal is 0.26315. 


The idea of kurtosis is usually stated sim- 
ply, but the function measured by the 
moment formula, and to a large extent also 
by the rank formula, is not simple. The func- 
tion represents an interplay of several aspects 
of shape. 

Kurtosis is usually related to over- or 
underconcentration of frequencies (or area) 
near the mode of the distribution. From 
Pearson’s statements at the time he intro- 
duced the term it is clear that he thought of 
it in that light. “The fact that 8, is greater 
than 3 points to an emphasis of the modal 
frequency and to a reduction of the extreme 
frequencies. ... The expression 8; — 3 
(which measures whether the frequency to- 
wards the mean is emphasized more or less 
than that required by the Gaussian law) 

. . Actually, however, 8, is likely to be 
more sensitive to cases in the tails than to a 
given number of cases near the mode. 


Kurtosis was introduced as one of three 
criteria of normality (the other two being 
measures of skewness).** It has continued to 
be used in this manner. Yet there is an in- 
finity of ways in which a distribution may 
depart from the normal and yet give a value 
of 8. which indicates normal. In fact, for any 
sized weight (additional frequency) there 
exists a pair of points (one or more on each 
side of the mean) at which the weight can be 
added to a normal curve without having any 


* Karl Pearson, Biometrika 4: 170-71; June 1905S. 
Jbid., p. 181. 
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effect upon f,. (This is true to a restricted 
extent for Q/D.) 

If a very small weight is added far out in 
the tails of a normal distribution, 8, is greatly 
increased; in fact the smaller the weight the 
greater the increase in 8. when the weight is 
placed a considerable distance out. 

Towards the mean (but not at the mean) 
very large areas (of a specific magnitude) 
may be added without changing Bs from its 
value for a normal curve. 

At V3 o from the mean, changes in area 
are likely to have their maximum effect on 
8.2, except for locations some distance out in 
the tails. 
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Explorations in the present study were 
limited to adding (or subtracting) weights to 
a normal curve in such a way as not to dis- 
turb symmetry. Unlimited possibilities of 
combinations of changes were not investi- 
gated. Further, the weights were added only 
at points. Some work has also been done on 
the effect of raising (or lowering) the normal 
curve over certain ranges. The results of that 
analysis will be published elsewhere; in gen- 
eral they confirm and extend the findings re- 
ported herein. A symmetrical distribution 
need not be even approximately normal to 
give = 3. 
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THE WEIGHTING OF TESTS MEASURING THE SAME 
FUNCTION IN TERMS OF THEIR LENGTH 


TEOBALDO CASANOVA 
University of Puerto Rico 


The formulas for the weighting of tests to 
be combined into sums or averages in terms 
of their validities, reliabilities, or other cri- 
teria, have not been simplified so that they 
may be used by non-technical workers. In 
most practical situations tests validities and 
reliabilities are not easily available, and not 
seldom school administrators and other work- 
ers in the field assign weights to tests which, 
although numerically arbitrary, are claimed 
to be based on a variety of reasons which 
range from the very trivial to quite impor- 
tant ones. It is the purpose of this article, as 
the title indicates, to provide a simple method 
of weighting tests to be combined into a sum, 
in terms of their length, which, although de- 
signed for tests measuring the same function, 
may be expected to yield approximately cor- 
rect results when dealing with tests measur- 
ing similar functions such as those given in a 
school course of rather uniform subject 
matter. 

Kelley’ has given the formula 


w= — 


for the appropriate weighting of tests to be 
combined when the results are expressed in 
standard scores. w is the weight, r,; the reli- 
ability of the test, and c is a constant which 
need not be considered here because the 
expression 


(1) 


Vru 

is sufficient to give the ratio of the weight of 
one test to that of any other test, when both 
measure the same thing. The relationship be- 
tween the length of a test and its reliability 
is expressed through the well known 
Spearman—Brown formula: 


nr 


1+ (n—r1)r (2) 


Here r,; has the same meaning as in (1), r 
is the reliability of a shorter test, and m is 


T. L., Interpretation of Educational Measure- 
ments. World Book Co. 1927, p. 213. 


the ratio of the length of the longest test to 
that of the shortest one. Squaring (1) and 
substituting in it the value of r,; given, by (2), 


+ (n—1)r] 
{1 + (n—1)r—nr]? 


which upon simplifying becomes, 
r? r 


If an arbitrary unit of time is adopted, 
ie., @ 15-minute period, and if the reliability 
of a test of that length, r, is known or may 
be safely assumed, the weight of a test n 
times that long, w, may be obtained from 
(3). This equation is a quadratic in variables 
w and n, r being constant for the group of 
tests to be combined into a sum or average. 
It represents an hyperbola whose upper ver- 


tex is at the origin, and the coordinates of 
whose center are 


2r 


In order to facilitate the tracing of the 
hyperbola it is convenient to translate the 
origin of the coordinate axes to its center. 
Thus, the origin is to be moved upward along 
I—r 

ar 
to the point m= 0. This can be easily accom- 


plished by substituting n’ — for in 


(3). Doing so, (3) is transformed, in terms 
of variables w’ and n’ into 


aa 
* 


w= 


the n-axis from the point » —— 


or 
(r—s)? — 


4 


This equation in standard form may be 
used for tracing an hyperbola for each value 
of r, and then reading from the graph the 
value of w corresponding to each value of n 


(4) 
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WEIGHTING OF TESTS 


FIG) WEIGHTS OF TESTS ESTIMATED FROM THEIR LENGTHS 


in equation (3). However, if nm = 1, w only 
equals unity in (3) when r equals .38 or 
2.62. In a simple system of weighting it is 
highly desirable that w — 1 when m = 1, 
whatever the value of r. 


Since 


such an equation is obtained by multiplying 


each one of the terms in m in (3) by the 
reciprocal of this last value. Doing so, 


rn? + (1 —r)n* (5) 
* Kelley’s well known formula o,°(1 —r) =o,2(1 —r,,) 
not be used to determine the relative weights of tests } 
i from their reliabilities r and r,, because it was derived 


for comparing the reliabilities obtained from the same test on 


samples a varying degrees of bed bere, and Wh which 
is different from ey one described ich requires 


Ww. 
assumptions undering 3 
— 1)r] 
As in the derivation of (5), if o,, the standard deviation of 
a unit-of-time test, with reliability r, is equal to 1.00, 
But from (5) 
w = — 1)7) 
or an effective weight Va times the one obtained above. 


239 
| 

and 

= 

ter. 

along TTT 

n 1 2 3 4 5 6 7 8 = ion 

:com- 

n in 

terms 

| 


240 


Each value of w calculated through (5) is 

given by (3). Therefore, for any given value 
of 7, the ratio that the weight of a test of a 
given length bears to the weight of a test of 
a different length has not been changed 
through the last operation. Of course, the 


shape of the hyperbola was changed because 
its slope at any point has been multiplied by 


times the corresponding value 


the constant ——— 
Vr 


But the vertex of the upper branch is still at 
the origin and the center remains at the point 
I—r 
ar 


Table I gives the values of w for values 
of m from 1 to 12, and for values of r from 
.20 to .80 as obtained from (5). In Figure 1, 
the positive side of the upper branch of a 
family of non-concentric hyperbolas, all pass- 
ing through the points 1, 1, and 0, 0, repre- 
sent equation (5) for different values of r. 
The graph may be used for values of that 
are not whole numbers. 

In a series of tests of different lengths on 
the same or similar subject-matter, one may 
assume a basic reliability, r, for a unit-of- 
time test, say, a 15-minute test. In this case 
the value of m for each test is equal to its 
length in minutes divided by 15. Since 
formula (1) was derived for standard scores, 
the value of w given in the table and in the 
graph is the effective weight of the test, that 
is, its standard deviation after weighting. 
Therefore, the multiplier m,, or the nominal 


w= 0 n= 


[1 + (m —1)r][1 + (m;—1)r] 


ri"; = 
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weight for test i is given by m,— =. (6) 
i 


o, being the standard deviation of test i 
before weighting. 

After each raw score in each test 7 has 
been multiplied by the nominal weight, m,, 
and the weighted scores for each subject have 
been added, the standard deviation, or effec- 
tive weight, of the sums of the weighted 
scores may be easily estimated. Remember- 
ing that the w’s are the standard deviations 
of the tests after weighting, and letting o,, 
stand for the standard deviation of the sums 
of the weighted scores, 

N NN 

22 (i += j) (7) 

t= I 


N is the number of tests and 7; is the cor- 
relation between any two tests. Since these 
measure the same function, their correlation 
corrected for attenuation may be assumed to 
be equal to unity. Then, 


(8) 
r, and r; being the reliabilities of tests ¢ and j. 
By the Spearman—Brown formula, 

- 


1+ (m,—1)r 


and 
ny 
1+ (m;—1)r 
in which m, and m, have the same meaning 


as before with respect to the tests indicated 
by the subscripts. Hence, — 


(9) 


TABLE I 
WEIGHTS OF TESTS OF VARIOUS LENGTHS AND RELIABILITIES 
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But from (5), 
wy = (m;—1)r 


From (8), (9), and (10) 


Substituting this in (7) 


N NN. 
= + 27 mn; 
j=r 
From (5) 
N N N 
im 
i=1 
so that 
N N 
i—TI 


Since the sums represent scores on a test 
whose length is =, (11) can be obtained 
by substituting =m, for m in (5), and the 
value of o,, can be obtained from Table I or 
from Figure 1, if =m, is within their range. 
It must not be forgotten, however, that osw 
as given by (11) is an estimate whose accu- 
racy depends on the degree to which the 
assumptions involved in (8) and (9) are ful- 
filled. The correct value may be obtained 
from (7), which requires the actual intercor- 
relations among the tests. 


An idea of the degree of approximation of 
the value of os calculated through (11) 
may be had from Table II. It includes two 
hypothetical cases. In the first one a unit- 
of-time test is combined with a test eight 
times that long, their intercorrelation being 
8575. The entries are the standard devia- 
tions of the weighted sums calculated by 
formula (7), and those estimated through 
(11), for several values of r. It will be ob- 
served in this as in the second example that 
the approximations are rather close, and that 


TABLE II 


WEIGHTING OF TESTS 948 


the accuracy of estimate depends on how 
near to the actual value is that assigned to 
r.* In the first case, an r of .80 gives the 
most accurate estimate, while in the other 
example, the closest value is obtained by 
assuming an r of .5o. 


In a series of similar tests on the same 
subject given to the same individuals, one 
may select the appropriate r if he knows 
whether he is dealing with tests that are 
highly reliable on the average, or with tests 
of medium or low reliability. Suppose for 
example, that four such tests given to a group 
of children may be properly classified as 
tests of medium reliability. If a unit of time 
of 15 minutes is adopted and r is assumed to 
equal .50 for a test of that length, the values 
of nm for four tests with lengths of 15, 45, 60, 
and 120 minutes will be respectively 1, 3, 4, 
and 8. In the same order, the w’s given in 
Table I are 1.00, 2.45, 3.16, and 6.00. These 
are the effective weights, or the standard 
deviations of the tests after weighting. Sup- 
pose that the standard deviations before 
weighting are 1.60, 3.50, 6.40, and 14.80. 
Then, from (6), the multiplier m,, or nom- 
inal weight for the first test is equal to 1.00 
divided by 1.60, or .62. Likewise m, — 2.45 
+ 3.50 = .70, m, = .49, and m, = .4I. 

After the raw scores on all of the tests are 
multiplied by their nominal weights, the 
weighted scores of each individual are added, 
and a distribution of weighted sums is ob- 
tained. Since the mean of the weighted scores 
for each test is equal to the mean of the raw 
scores multiplied by the corresponding 
nominal weight, the mean of the weighted 
sums for the four tests, M,», is naturally 
given by, 


My = .62 M, + .70 M, + .49 M, + .41 M, 


Where the M’s are the means of the raw 
scores on the tests indicated by the sub- 


* The value of r should not be confused with that of the 
intercorrelation r,,. 


COMPARISON OF ESTIMATED AND CALCULATED STANDARD DEVIATIONS OF WEIGHTED SUMS 


-20 
n, =1 ng=6 By ( 7 4.27 


30 - 60 -70 - 80 
33 7.75 8.14 
. 23 7.71 8.16 

56 - 60 5. 90 

43 . 68 6.03 
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scripts. The standard deviation of the dis- 
tribution of weighted sums, o,~, is estimated 
by substituting in (11). For the above data, 
sn, — 16, and r = .50, so that 


sw = V.50 X 167 + .50 XK 16 = 11.66 


It is important to note here that neither the 
standard deviations of the tests before 
weighting nor the weights assigned to the 
tests are required for the estimation of this 
last value, so that if this is the only quantity 
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needed, it may be readily obtained from the 
lengths of the tests combined, after assum- 
ing a suitable value for r.* Thus, in combin- 
ing the results of similar tests given in a 
school course of rather uniform subject mat- 
ter for the purpose of assigning class marks, 
the limits of the letter grades may be 
approximately estimated before weighting 
and adding the individual scores. 

*Since the unit of time is arbitrary, results obtained 


through different systems of weigh are not parab] 
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MACHINE METHODS OF HANDLING LARGE CLASSES 


Joun G. WATKINS 


More and more of our young college in- 
structors are being drawn directly into the 
war effort. This frequently calls for the con- 
solidation of classes into others of larger size. 
With reduced funds available for assistant- 
ships, a greater burden is placed on the re- 
maining instructors in the testing and han- 
dling of class records. In this article a system 
is described whereby the scoring of tests, the 
making of frequency distributions, the post- 
ing of scores, and the recording of absences 
have been turned over to machines. This 
results in freeing the teacher for the more 
professional duties of keeping abreast of his 
field and enriching his courses. The method 
is especially well adapted to large lecture 
sections. It presupposes that punched card 
= electric test scoring machines are avail- 
able. 


Crass ROLts 


In place of the usual class roll received 
from the registrar’s office, a complete set of 
class cards is furnished to the instructor, on 
each of which have been punched the stu- 
dent’s name, number, and any other data the 
teacher may desire, but leaving enough un- 
punched fields to permit the recording of all 
daily grades, test scores, and absences. Use 
of the registrar’s student number, or assigning 
class numbers to the students in alphabetical 
order, will facilitate the arrangement of the 
cards later. Instead of transferring the names 
by ink to a class record book, as is usually 
done, the names can be printed on a loose leaf 
record sheet from the class cards by an 
accounting machine. 


RECORDING ABSENCES 


Let us suppose that there are one hundred 
thirty five students in this class. They are 
seated by assigned class number, registrar’s 
number, alphabetically, or in any other man- 
ner, and the cards are sorted by the machine 
so that they are arranged in the same order. 
When roll is taken, the instructor or assistant 
need only note vacant seats and mark accord- 
ingly on the student’s cards. A mark is made 
in space 1 of a certain field to indicate the 
first absence for the student. If a type of card 
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is used which provides for Mark Sensing, and 
the Mark Sense Reproducing Attachment is 
available, the cards, when fed through the 
machine, will be automatically punched in 
space 1 in that field. The next time the stu- 
dent is absent the roll-taker marks space 2 in 
that column, which is accordingly punched by 
the machine. Usually sending the cards 
through the machine once a week for punch- 
ing and sorting will record all absences and 
indicate those students who are approaching 
or have passed the limit of permitted ab- 
sences. Excused absences can be punched in 
an adjacent column. Thus a complete ab- 
sence report can be had at any time, merely 
by running the cards through the tabulating 
machine and printing records, which, of 
course, can be made in duplicate, triplicate, 
etc. This work, for the instructor, usually 
involves a tedious searching of the records 
and writing of the report, when machine 
methods are not used. 


TESTING 


Testing is of the objective type, and stu- 
dents record their answers on the electric 
machine-scoring answer sheets. These can be 
scored in a very short time by the Test Scor- 
ing Machine. If the machine is near the class 
room it is possible to give a short test during 
the first fifteen minutes of a period and have 
it back in thirty minutes all scored, the 
grades recorded, a frequency distribution 
prepared, and records of students with their 
scores arranged both alphabetically and in 
order from high to low. In addition, the 
Electric Test Scoring Machine can furnish an 
item count, recording the number of students 
getting each item correct. This gives the 
teacher a chance to do remedial instruction 
by spending further time on only those items 
which the class did not know very well. 
If papers have been arranged in order of 
total score, and separate item counts made of 
upper and lower groups, it is possible to read 
directly from the Flanagan charts (1) a bi- 
serial correlation coefficient for each ques- 
tion. The instructor is thus enabled to retain 
strong items and eliminate weak ones in 
future tests. Item analysis has seldom been 
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practiced in: college classroom testing in the 
past, because the labor involved in doing 
this by hand methods was prohibitive. 


HANDLING OF TEST PAPERS FOR 
Quick ScoRING 


If test papers are to be scored rapidly, and 
the grades and item counts returned during 
the same period, it is necessary, of course, 
that an efficient system of handling be estab- 
lished. The papers should be collected from 
the class according to seating arrangement, 
so that they will be in the same order each 
time. Students will usually cooperate in this, 
once the system has been explained to them. 
The papers are then immediately rushed to 
the machine operator, along with the punched 
class cards, which have previously been 
sorted in the same order. The cards are 
placed in a Card Punch which is adjacent to 
the Test Scoring Machine. The operator 
reads the score from.the Test Scoring 
Machine and punches it directly on the card. 
It is not even necessary to record the score 
on the test paper as is usually done. Tabs on 
_ the punching machine are set so that as soon 

as one card is punched it automatically ejects 
the card, brings in the next one, and stops 
with the correct field in place for punching 
the next score. No manual handling of the 
cards at this point is necessary. Absences 
will be punched by the Mark Sensing Attach- 
ment, and those cards can be immediately 
ejected. As soon as the papers have been 
scored and the cards punched, the cards are 
sent first to the Sorting Machine and then 
to the Tabulating Machine, while the Test 
Scoring Machine operator runs the papers 
through again for an item count. If the 
papers are to be scored and the records re- 
turned within the same class period, it will be 
essential that the instructor have previously 
furnished the machine operator with the key 
to the test, so that the scoring stencil can be 
punched and the item count board plugged 
in advance. The Card Punch will also have 
been previously set up. The machine oper- 
ator will sort the cards in order, alphabet- 
ically or by class number, and post a list. on 
the Tabulating Machine in duplicate form. 
This will furnish one copy to be placed on 
the bulletin board and another for the in- 
structor’s records. The cards will then be 
sorted in order from high to low, and another 
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list posted, thus furnishing a raw-score fre- 
quency distribution of the class. If the tabu- 
lating machine’s plug-board has been plugged 
in advance, the making of these records will 
take only a few minutes, and should be fin- 
ished as soon as the item count is ready. All 
records can then be rushed back to the class 
within half an hour after the papers were 
first received. The tabulating machine can be 
wired to report the grades by class number, 
not printing the student’s name from his 
card, if the instructor wishes to save the low- 
grade student embarrassment. 


Probably most schools will not find it 
feasible to provide this test scoring and tabu- 
lating service to the instructor as rapidly as 
described. The demands on the machine 
office are frequently such that it may be 
twenty-four hours or more before the tests 
can be conveniently handled. However, the 
steps necessary for rapid treatment of the 
papers, which have been outlined here, will 
be essentially the same whether the instructor 
gets his grades back in half an hour or the 
next day. 


All too frequently the papers of a large 
class are not graded, grades returned, and the 
results discussed in class, until many days 
have elapsed, because of the demands on the 
instructor and his assistants when hand scor- 
ing and tabulating methods are used. By 
machine methods it is possible to return a 
complete report of the test results within the 
hour. Busy instructors ‘will welcome this 
emancipation from routine bookkeeping. 


Frnat REcoRDS 


At the end of the semester or quarter, 
final averages, records of grades, and num- 
ber of absences can be posted easily by the 
machines from the class cards and returned 
to the Registrar’s Office. The original class 
cards can also be returned, and the class 
record of each student quickly punched on 
his permanent record card by the Automatic 
Reproducing Punch. 
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FISHER’S «TEST AS A SPECIAL CASE OF HIS z-TEST 


P. J. RULON 
Harvard University 


The teaching of the techniques of analysis 
of variance, as represented by the commoner 
texts addressed to students of psychology 
and education, proceeds without showing 
very clearly any explicit general scheme 
running throughout the various procedures. 
The result is often a confusion in the mind 
of the student as to the justification for the 
procedures, and questions of “how many de- 
grees of freedom” are frequently raised by 
the student even after he has given some time 
to the study of the subject. An approach to 
the job of synthesizing the various applica- 
tions of the analysis of variance has been 


made by Jackson,’ building upon the work. 


of Kolodziejczyk? and Johnson and Neyman.* 
By means of the approach used by these 
workers, the ¢-test of Fisher is seen to be a 
special case of the same author’s z-test. It is 
common for writers to point out the relation- 
ship between ¢ and z, but it is unfortunately 
not common for them to show the common 
source of the two tests. It is the purpose of 
this paper. to apply the approach to deriving 
the z-test in a situation which yields the 
special case of the ¢-test, and to show that 
the ¢-test comes therefrom. 

The kind of situation in which either the 
t-test or the 2-test is applicable is that in 
which the individual’s score, X;,, may be 
considered a linear function (made up of, 
due to, or represented by, a linear combina- 
tion) of two or more different factors (ele- 
ments, causes, or parts). For example, the 
initial scores of m, + m, — N children in the 
two groups of a parallel-group experiment 
could be represented as follows: 

X,, = the score of the first child in the 
first group, 

X,. = the score of the second child in the 
first to 


w. B. the. Analysis of Vari- 
and Educational Problems, 
Toronto: t of Educa’ h, University of 
Toronto, 1940, 103 pp. 
Stanislaw 


an Important Class of Sta- 
— Hypotheses,” Vel Volume 27 (1935), pp. 161- 


heses their 


tion Educational 
lems,” Research Volume 1 (1936), 
Pp. 37-93. 


Xj, = the score of the last pupil in the 
first group, there being m, pupils in Group 1. 
For the second group, the scores would be 


X22, ----, Xan, In general, then, 
these scores are written X;,., where i = 1, 2, 
and ¢ = 1, 2, 3, ----, m, m being the num- 


ber of pupils in the ith group. 

It is supposed that in the population from 
which these pupils were drawn, the scores 
X;, can be represented by 


Xic—a+ Bi + on, 


where the values of a and £ are not known, 
but a is the factor common to all pupils and 
A; is the group factor common to the pupils 
in the ith group. «;; is the individual factor, 
different for every individual. The tests de- 
rived are exact if in successive samples «;; 
is normally distributed about zero with the 
sampling standard deviation of »,, equal to 
that for w,,, and equal to that for every 
other 


The procedure is to write for the sample 
an analogous expression, 


Xn =A+Bi [1] 


where A and B, are still unknowns, but are 
to be determined so as best to fit the sam- 
ple. The adjustment of A and the B, is the 
principal part of the procedure which fol- 
lows. 2;, is the individual factor and is not 
necessarily normally distributed in the sam- 
ple. Our adjustment of A and the B, will 
turn out to be such that the mean 2;; in the 
sample will be zero, and the standard devia- 
tion of z,, in the sample should be equal to 
that of z,, within the limits of sampling 
errors, for that is the test of the equality of 
the sampling standard deviations of ;:, dis- 
cussed above. 

The adjustment of A and B, is carried out 
in such a way as to minimize the variance of 
Zi: in the sample. From [1] we obtain 

= Xie — A — Bj. 

Actually (but it is the same thing), what is 


minimized is the sum of the squares of the 
2’s. We set 
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x’ 
it 


and so minimize 


— A —B,)?. [2] 
it 


The minimization is with respect to the 
unknowns A, B,, and B,. A straightforward 
approach to this minimizing would involve 
differentiating ,*? with respect to A, then 
with respect to B,, and finally with respect 
to B,, setting the resulting three partial de- 
rivatives equal to zero and solving the three 
simultaneous equations for the three un- 
knowns A, B,, and B,. But this approach 
leads to difficulties. Differentiating [2] with 
respect to A gives 


2 


Setting this equal to zero, dividing through 
by —2, and distributing the summations, 
gives 
— — =o, 
it it it 
or 
—wNA — =n, B, =o. 
it i 


[3] 


Next differentiating ,* with respect to B, 

gives 

2 

t 


from which, as above, 


=X), — 3A — 3B, — 0, 
t t t 


or 
Sti [4] 
t 


In the same way, differentiating with respect 
to B,, setting equal to zero, etc., gives 


— — n,B, = 0. 


The three equations [3], [4], and [5] are 
the ones we would naturally try to solve for 
the three unknowns A, B,, and B,. Unfortu- 
nately, however, the three equations are not 
independent, as can be seen by adding to- 
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gether equations [4] and Is}. Remembering 
that n, + n, = N, this yiel 

— NA — = 0. 

it i 
This is exactly equation [3], so all the rela- 
tionships among the three unknowns are 
described by equations [4] and [5], and we 
have only two independent equations with 
three unknowns. To get a solution, we need 
another equation involving the unknowns. 
This can be obtained by “placing a restric- 
tion” upon the B’s. A natural proposal is to 
require that the group factor B, be measured 
as a deviation from the general factor A. 
This is a common procedure in statistical 
work and we know that it does not restrict 
the usefulness of our results. It amounts to 
setting 

an, B, = oO. [6] 

i 


An easy way to minimize [2], subject to 
the restriction [6], is to employ a Lagrange 
multiplier.* If the function to be minimized 
be called F and if © = o is the equation of 
restriction upon the variables, then write 
u == F + y, where y is a constant whose 
value will be determined in the next steps. 
Our F is given by [2] and our ® = o by 
[6], so we write 

+ yin, B;. {7] 

it i 


Now we minimize u with respect to A, B,, 
and B,, just as though all three of them were 
independent, and we hadn’t tied the B’s to- 
gether by equation [6]. Lagrange noted 
that this procedure yields the same solution 
for A and B, as would be yielded by solving 
[6] for one of the B’s and substituting in 
[2], leaving only two unknowns to solve for 
in minimizing [2]. 

From [7], then, we find 

bu 


—B;), 


from which 
—NA— =o. 
it i 


From [6], the third term of this is equal to 


zero, SO 
*A brief treatment of the mm erk: is given in 
us, New York: Macmillan, 

notation are his. 


W. F. Osgood, Advanced C 
1928, pp. 180 ff. The F and } 
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[8] 
it 


=X. . say. [9] 
it 


Next from [7j we get 


bu 


which, upon setting equal to zero etc., gives 


n,A — n,B, — =o. [10] 
t 


Similarly differentiating [7] with respect to 
B,, setting equal to zero, dividing through by 
—z2, and distributing the summations, gives 


=o. [11] 


We can now solve for y by adding together 
equations [10] and [11]: 


it i 


By [8], the first two terms of this are equal 
to zero. By [6] the third term is zero. 
Therefore the fourth term is zero and y = o. 
Then from [10], 


—,A — n,B, = 0, 
t 


[12] 

Now substituting the values of A and B, 
from [9] and [12] into the expression for 
x’ in [2] gives a minimum value for ,*. This 
value is called y,?: 


..— (X,.—X..)}, 
it 


[13] 


This is seen to be the sum (%) for both 
i 


at 
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groups of the sum (=) within the group of 
t 


the squares of the deviations of the scores 
(X;.) from the group mean X,. . It is 
called the sum of squares within groups. 
With this sum of squares goes a certain 
number of “degrees of freedom,” commonly 
abbreviated n.d.f. This n.d.f. can be deter- 
mined explicitly (without resort to intuition) 
as follows: from the total number NV — n, + 
m, of cases, subtract s, where s is the number 
of variables with respect to which » was 
minimized, reduced by the number of La- 
grange multipliers appearing in u. We differ- 
entiated three times (successively with respect 
to A, B, and B,), and we used one Lagrange 
multiplier y. Hence in our case s = 3 — 1 
= 2, and n.d.f.— N — s — N — 2. This is 
frequently written n, + n, — 2, since N — 
n, + m,. And f, is frequently used to denote 
the n.d.f. for x.’. In this notation we have 
fo = n, + nm, — 2. x,” and f, will be used 
together in the expression for z in the z-test 
and also in the expression for ¢ in the ¢-test. 

The next step, still following Kolodziejczyk 
and the others, is to rewrite x* to represent 
the sum of squares of the individual factors 
under the null hypothesis. When the ¢-test 
is being used, the null hypothesis is that the 
two groups are equal; that is, that the group 
factor 8, is of no importance in the popula- 
tion sampled. More explicitly, the hypothesis 
is that 8; = 0, i = 1, 2, and all the varia- 
tion in X,, is “due to” no group factor at all, 
but simply to the general factor a and the 
individual factors »;,. Analogous to this, we 
write for the sample, 


Xi Bi, 


and 
x? = — A)’, [14] 
it 
and minimize this x* with respect to A. We 
have 
— A), 
from which we obtain 
NA—o, 
it 
[15] 
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This is the same value we obtained in [9] 
above. We did not use any Lagrange multi- 
plier this time because we did not need to. 
We did not need to impose any restriction on 
the variables (there was only one: A) to get 
the solution. Substituting [15] in [14], we 
get a relative minimum value for ,’, called 
xr? 


xe? = X..)?. [16] 
it 


This is seen to be the sum for all pupils in 
both groups, of the squares of the deviations 
of the individual scores from the general 
mean. This xr’ will be used in the binomial 
Xr” — Xa” as a term in z for the z-test and as 
part of ¢ in the t-test. For the z-test, we write 


[17] 


For this expression we have determined all 
of the values except for f,, the n.d.f. for 
(xr? — Xa”). This f, is always determined 
explicitly by f, = r, where r is the number 
of independent equations appearing in the 
statement of the null hypothesis. In terms of 
our sample, the equations of our null hypoth- 
esis were 

B, = 0, ¢= 1, 2. 


This represents two equations: one when 
i = 1 and one when i = 2. However, equa- 
tion [6] said 


n,B, + =o. [6’] 
Of course both m, and nm, are known for the 
sample, so that if either B, or B, is deter- 
mined, the other B can be found from [6’]. 
Hence there is only one independent equation 
in the statement of our null hypothesis: 
B, = 0, i = either 1‘or 2, not 1 and 2. So 
f, =r where r = 1, so that f,; — 1. We have 
already seen that f, =m, + n, — 2. 


Before substituting in [17], it will be well - 


to simplify the expression for x.” — x,”. From 
[16] and [13], 


xr — xe? = 
it 


= X.2 
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After distributing the summation signs and 
combining as many terms as possible, this is 
seen to be 


2__NX..?, [18] 


xr Xa” =m, X, 
i 
Now substituting in [17] for z we have | 


.? —NX,. 
i 


I 
it 


I 
= — log. [19] 


n, n,—2 


This is the z which would be computed for 
two groups if the difference between their 
means were to be tested by the z-test. It is 
a special case of the more general formula for 
z in the situation in which it is more com- 
monly applied: determining the significance 
of the differences among a set of & means 
from & groups, i == 1, 2, 3,..., &. The more 
general formula differs from the above by 
having 

f k 

f — 2. 

i 


Since in our case & = 2, our f, = 2—1—1, 
and our f. = m, + nm, — 2. Thus we could 
have taken the commonly available general 
formula and set k = 2 and come out with 
[19] as above. But that wouldn’t show 
“why” f, = 1 and f, = m, + n, — 2, and 
this is the question students are always 
asking. 

We will now show that the complex frac- 
tion on the right of [19] is equal to #?, the 
square of the ¢ of Fisher’s ¢-test. For this 
purpose we could start with Fisher’s state- 
ment® of ¢, but it is rather messy, so we will 
use a rearrangement out of Lindquist,’ 

5See R. A. Fisher, Statistical Methods 
ers, Edinb Oliver and 
several editions, but in each, Section 241. 


*E. F. Lindquist, Statisticel a“ in Educational Re- 
search, Boston: Houghton Mifflin, 1940, p. 57. 
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changing the notation to agree with that used 
in the preceding discussion, and writing 
from the formula for ¢: 


hee | I I | 
it 
n, +n,—2 


The first term of the denominator of this 
is already recognizable as the denominator 
of our [19], so it remains only to show that 
the rest of [20] is the same as the numerator 
in our [19]. First we note that when there 
are only two groups, # = 1, 2, the general 
mean X .. is given by the weighted mean of 
the two group means: 


.+ 


n, + M2 
Hence 
=X, n,X,.+ 2X, 
n, + 
whence 
myn,*(X,.— X2.)* 
Similarly 
. 
n,(X_..—X ax ) 
Therefore 


Expanding the left side of [21], distrib- 
uting the summations, and combining terms, 
gives 


nN, 


FISHER’S t-TEST 


The left side of this is the numerator of our 
[19], so we can rewrite [19], using as 
numerator the right side of [22]: 


2. witha ~ 
it 


Upon moving the fraction from the numer- 
ator to the denominator, the complex frac- 
tion on the right of this becomes exactly our 
[20]. Therefore what [23] says is that 


—loget?. 


From this, 
z= loget. [24] 


This is the relationship commonly pointed 
out between z and ¢. 

In addition to showing that this is the rela- 
tionship which holds between z and ¢, we 
have shown the form of the z-test appro- 
priate to testing the difference between two 
means, and have set forth an explicit method 
of determining the “number of degrees of 
freedom” (n.d.f.) when the approach em- 
ployed by Jackson is used in setting up an- 
alyses of variance. Since this approach is 
applicable to a very wide variety of educa- 
tional problems, we have shown an explicit 


[21] 


method for determining the n.d.f. in a con- 
siderable range of situations involving anal- 
ysis of variance. 


‘ 


.)*. [22] 
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THE VALIDITY OF A COMPREHENSIVE EXAMINATION FOR 
SCHOLARSHIP AWARDS IN NEW YORK STATE! 
Warren G. FINDLEY 


Assistant Director of Examinations and Testing 
New York State Education Department 


In New York State higher education of 
outstanding pupils is furthered by the award 
from public funds of scholarships that may 
be used at any accredited college in the State. 
The determination of these awards has been 
based on averages of marks earned in certain 
academic subjects on Regents examinations, 
which are statewide subject-matter examina- 
tions prepared and administered under the 
auspices of the State Education Department. 
Specifically, the following examinations and 
weights are currently used in computing 
scholarship averages: 


English Four Years 

American History 

Plane Geometry 

Intermediate Algebra 

Language Three Years or Science 3 


In December 1939 the State Examinations 
Board, an advisory body to the New York 
State Education Department, approved a 
proposal that an experiment be carried on 
over a two-year period to determine the 
relative merits of the present system of 
scholarship awards based on averages of 
marks earned in specified Regents examina- 
tions and a system based on a largely objec- 
tive comprehensive examination designed for 
administration on a single day near the end 
of the high school course. 


Several studies? had shown that the 
Regents average was equivalent or superior 
to a variety of other measures in predicting 
college achievement. The State Examina- 
ho contributed to. this study in 

individual _ackn 


important ways is 
edgment. The ? writer wishes to mak 
to Warren W. Knox, Director of 
York State Education Department, who as 
Committee on Scholarships of the’ State Examinations Board 
played a leading part in planning the study and in guiding 
its ate at stages. 

2 See . S. ed. Studies in articulation of high school 
and of Buffalo Studies. 1934, vol. 9, 


pp. 125-144, 
Wood, B. Measurement in higher education. 1923, 
Wald ‘Book’ Co, Yonkers, N. Y., pp. 81-88. 


tions Board, however, had been influenced in 
its thinking over a period of years by a num- 
ber of considerations which combined to 
make it seem advisable that an experiment 
should be conducted to determine the feasi- 
bility of using a new system of awards. The 
considerations may be summarized as follows: 

1. Regents averages are affected by a 
number of extraneous factors: 


A. Examinations taken before the candi- 
date’s senior year cannot be identified 
as scholarship papers and therefore are 
not rerated as such at the State Edu- 
cation Department. 

B. January examinations of a given year 
may be easier or more difficult than 
June examinations. 

C. Examinations in one elective field may 
be easier or more difficult than those 
in another elective field. 


2. Examinations with a fixed passing 
mark of 65 and a maximum possible score 
of roo can not discriminate sharply among 
scholarship candidates. 

3- Marks in examinations taken as many 
as two years before high school graduation 
are not entirely appropriate to use in esti- 
mating a pupil’s competence at the end of 
the high school course. 

4. Requiring scholarship candidates to 
take four specified Regents subjects and one 
elective, chosen from language or science, 
tends to set up as a pattern to be followed 
by all pupils a “fixed” program of academic 
courses. This also tends to discriminate 
against pupils following special curricula in 
fields like music, art, business subjects, etc., 
who would find it impossible in most cases 
to include all the “scholarship subjects” in 
their programs. 

5. Use of averages in Regents examina- 
tions, some of which come in June of senior 
year and are rerated in July, prevents 
announcement of awards until late summer 
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and thus prevents many candidates from 
making final plans for attending college at a 
reasonable date. 

The group primarily responsible for the 
technical construction of the experimental 
examination determined upon a plan which 
may be said to have been influenced by four 
major guiding principles: 

1. The test should sample the outcomes of 
instruction in the core-curriculum of the 
secondary school (grades 7 through 12) in 
proportion to the weight of the core subjects 
in the total core-curriculum of these grades. 

2. Questions should be based not only on 
the prescribed content of core-curriculum 
subjects, but also on those of these 
core areas in which a well-informed and 
alert twelfth-grader might well have acquired 
information or competence as an outgrowth 
of what he acquired through studying the 
core-curriculum subjects, but without special 
instruction in the secondary school. 

3. The test should be as objective as pos- 
sible without sacrificing measurement of 
important outcomes. 

4. All pupils should be required to answer 
all questions without choice. 

Application of the first principle, that the 
core subjects should be represented in this 
examination in proportion to their relative 
weight in the total core-curriculum, is re- 
flected in Table tr. 


TABLE I 
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Application of the second principle, that 
questions should be based not only on the 
prescribed content of core-curriculum sub- 
jects, but also on aspects of these areas in 
which scholarship candidates might well have 
acquired information or competence without 
specific instruction in the secondary school, 
resulted in inclusion of many worth-while 
questions which tended to make the test 
more appropriate to the scholarship group 
and better suited to the purpose of selecting 
from among them the most outstanding 
candidates. The samples given below indi- 
cate the effect of applying this principle in 
various fields. Generally speaking, the ques- 
tions in English and in American history 
were merely difficult questions on what had 
been taught in these subject areas, which 
constitute part of the core-curriculum in the 
twelfth grade; the questions in science and 
mathematics were such that a brilliant pupil 
under good instruction from grade 7 through 
grade 9 might have been able to answer 
them correctly at grade 9, but would be more 
likely to have developed a _ well-rounded 
background for answering them by grade 12; 
the questions in art, music, information about 
colleges and occupations, and the foreign 
titles in world literature were more dependent 
on what pupils might have acquired as an 
outgrowth of, but beyond, what they acquired 
through direct instruction. No questions 


DISTRIBUTION OF WEIGHTS GIVEN TO CORE SUBJECTS IN THE 1940 CURRICULUM OF GRADES 7 
THROUGH 12 AND IN THE COMPREHENSIVE HiGH SCHOOL TEST, 1940 ForM 


CURRICULUM 


Health and Safety 
Practical Arts 
Mathematics 
Art 


between E 


qo is a social studies topic an credit was given for effectiveness of expression ( 
Roh), wh while the r half was given for content and organization (social studies). 
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were included in any section of the test that 
might have been answered readily by pupils 
who had pursued elective work beyond the 
core-curriculum minimum in that field, but 
that covered material about which pupils 
who had not pursued this advanced study 
would have had little opportunity to learn. 


SAMPLE OBJECTIVE QUESTIONS 


In addition to ordinary multiple-choice 
questions on vocabulary and reading, the 
objective portion of the Comprehensive High 
School Test consisted of questions like these: 
(Correct answers are given at the left) 


3 1) Before the World War (1914-18) the United 
States was 1 not dependent on foreign capital 
2 abundantly supplied with capital for industrial 
development 3alwaysadebtor nation 4 always 
a creditor nation 5 alternately a creditor and a 
debtor nation 


2) In order to go from New York to England by 
the shortest route ships proceed 1 due east 
2 south and then east 3 to the Canary Islands 
aT ee the line of a great circle 5 to the 
no 


3) The design of the Lincoln Memorial in Wash- 
ington, D. C., is most accurately described as 
1 classical 2 Egyptian 3 modern 4 Roman- 
esque 5 Gothic 


4) A surface of a packing box at an angle to an 
observer appears 1 curved 2 enlarged 3 un- 
changed 4 foreshortened 5 lower and wider 


5) In which part of a quartet of mixed voices is 
the melody usually found 1 Alto 2 Baritone 
3 Bass 4 Soprano 5 Tenor 

6) How many half-steps are there in an octave? 
1 Eight 2 Ten 3 Twelve 4 Fourteen 5 Six- 
teen 

7) An insect poison which is harmless to human 
beings is 1 Paris green 2 lead arsenate 3 Bor- 
deaux mixture 4 nicotine sulphate 5 Derris root 
8) The sun is a 1 planet 2 satellite 3 meteo- 
rite 4 star 5 galaxy 

9) If the automobile license fee is charged at the 
rate of 50 cents for every 100 pounds or fraction 
thereof, what will an owner have to pay if his 
car weighs 3420 pounds? (1) $17 (2) $17.10 
(3) $17.50 (4) $170 (5) $171 

10) A kitchen is 15’ by 12’. Linoleum costs $1.75 
per sq. yd. If it costs $.10 per sq. yd. to lay it, 
what will it cost to cover the kitchen floor? 
(1) $111 (2) $2 (3) $333 (4) $27.75 (5) $37 
11) The Alhambra, about which Washington 
Irving wrote, was 1 a Knickerbocker village 
2 an English estate 3 a Spanish treasure ship 
4 a Dutch burgher 5 a Moorish palace 

12) The medieval European legend of a man sell- 
ing his soul to the devil is retold in 1 Poe’s The 
Fall of the House of Usher 2 Goethe’s Faust 
3 Shakspere’s Hamlet 4 Irving’s The Devil and 
Tom Walker 5 Milton’s Paradise Lost 
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SAMPLE Essay Question, WITH 
INSTRUCTIONS 

Instructions: At the top of this page you will 
find a paragraph. Treat the paragraph as the intro- 
ductory paragraph of an article (essay). Write be- 
low it in 200-300 words a logically organized devel- 
opment of the ideas expressed in the paragraph. 
You will be marked on the content and organization 
of your writing. You will mot be penalized for eras- 
ures or other corrections you make. Please hold to 
a reasonable standard of legibility. 

Behind and after all the problems of war in these 
times lie the problems of peace. We want peace for 
the opportunities we feel peace brings. The great 
problems, then, are how to achieve a constructive 
peace, how to make peace secure and lasting, and 
how to solve the human problems that will con- 
tinue to exist in a peaceful social order. 

The third principle, that the test should 
be as objective as possible without sacrific- 
ing measurement of important outcomes, led 
to the assignment of five of the six hours 
taken by the examination to the answering 
of 483 multiple-choice questions, and one 
hour to the writing of two brief essays. The 
objective questions were used extensively to 
accomplish as complete and even coverage as 
possible of the many fields included in the 
Comprehensive Test. The essays were used 
to check the candidates’ effectiveness of 
expression. 

The fourth principle, that all pupils 
should answer all questions, was necessary to 
insure comparability of ratings and was jus- 
tified by the fact that all questions were 
based on core-curriculum areas. 

The original questions in each area were 
generally submitted by members of the com- 
mittee which prepares the Regents examina- 
tions in that area. These questions were then 
edited by examiners or supervisors in the 
State Education Department. The problems 
in mathematics were tried out in the 
eleventh-grade classes in three cities to de- 
termine the wrong answers most frequently 
given. With this information it was possible 
to set up the mathematics section in objec- 
tive form, offering as alternative responses to 
each question the right answer and the four 
most common wrong answers. 

When an adequate supply of questions in 
each area had been assembled in desired 
form, a special reviewing committee went 
over all material and did whatever editing 
and selecting seemed desirable, with special 
attention to the specification that questions 
must not favor those who had specialized in 
a particular field. This committee consisted 
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of the Director of Secondary Education, the 
two persons chiefly responsible for curricu- 
lum development in the Division of Sec- 
ondary Education, and the writer. 

The Comprehensive High School Test, 
1940 Form, as this test was called, was given 
Friday, June 21, 1940, the Friday of Regents 
examination week, to all scholarship candi- 
dates who present this themselves for ex- 
amination. A total of 4024 potential June 
graduates, over 90% of the scholarship 
candidates from this group, took this test. 
They included 550 of the 750 who received 
University scholarships in 1940 on the basis 
of Regents averages. Machine-scoreable an- 
swer sheets were used for the objective ques- 
tions and were scored by machine. The two 
essays were written by each candidate in a 
four-page folded booklet, each page 834” X 
11” and ruled with 34” lines. The essays 
were rated by six experienced English exam- 
iners, members of the summer examining staff 
in English chosen for their proven compe- 
tence in rating compositions regularly in- 
cluded in the English Four Years Regents 
examination. 

Analysis of the results showed that 45% of 
the candidates who would qualify for scholar- 
ships on the Comprehensive Test under the 
prevailing county allotment system of schol- 
arship awards, were winners on the basis of 
the Regents averages now employed. 

In a comparative study, the selection of 
a criterion. by which the relative merits of 
two bases of award may be judged is not an 
easy matter. In this case it was decided to 
use as a criterion first-year achievement in 
college. No college administrator would like 
to defend the thesis that the average earned 
by a student in his freshman year is the full 
measure of the extent to which he will benefit 
from higher education or of the probable 
quality of the contribution he will make to 
society as a result of receiving a subsidy in 
the form of a scholarship. A four-year aver- 
age is a more promising criterion, especially 
insofar as the first year of the course tends 
to depart little from secondary school work. 
Two limitations, in addition to the consider- 
ation of time, rendered the four-year average 
unsatisfactory as a criterion for this study. 
First, differences in grading standards in dif- 
ferent major fields in upperclass years render 


four-year averages in the same college not 


comparable. Second, student mortality in 
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college reduces the number of cases avail- 
able and results in a narrower range of abil- 
ity in the group to be studied because the 
least competent are dropped. So with due 
appreciation of its limitations, to which ref- 
erence will be made later in terms of the 
findings of this study, the criterion of first- 
year college achievement was used to evalu- 
ate the relative merits of the two bases of 
award, Regents examinations averages and 
Comprehensive Test scores. 


The two bases of award were studied in 
relation to the criterion of first-year college 
achievement in eight different colleges, five 
coeducational, one women’s and two men’s 
colleges. The coefficient of correlation be- 
tween the Comprehensive Test scores and 
first-year college averages was compared 
averages. Table II shows these correlations 
with the coefficient of correlation between 
the Regents averages and first-year college 
and certain others to which reference will be 
made later. In the coeducational colleges the 
correlations were computed separately for 
men students and women students. The data 
for the women students are given at the end 
of the table following the data for men 
students. 


Comparison of the first two correlations in 
each row of Table II reveals that, in general, 
the differences are small and in all compari- 
sons but one (College D—women, where only 
25 cases were involved) are less than twice 
the standard error of the difference. If the 
variation within each column is examined, it 
will be seen that the differences from college 
to college in the size of either coefficient are 
more marked than differences between the 
two predicters of achievement in their abil- 
ity to indicate achievement in a given col- 
lege; if one measure predicts achievement in 
a college well, the other does nearly as well; 
if either shows a low correlation with 
achievement, the other makes an almost 
equally poor prediction. The fact that these 
correlations vary so widely from college to 
college may be traced in part to the small- 
ness of the samples, and the fact that many 
of the values are not greater is partially due 
to the selected character of the groups 
studied, the groups of potential scholarship 
candidates that entered college being supe- 
rior to and less variable than general college 
freshman populations. The most satisfactory 
summarization of these findings, however, is 
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TABLE II 


CORRELATIONS OF COMPREHENSIVE TEST SCORES, REGENTS AVERAGES, AND PSYCHOLOGICAL 
EXAMINATION SCORES WITH ACHIEVEMENT IN THE FirsT YEAR OF COLLEGE, 1940-41 


Compre- 
hensive 
Test Total 


College N 
Men 


= 


8S. E. of 


Compre- 
hensive 


Regen 
Average Difference! Objective 


1 Figures in this column are the standard errors of the differences between the correlations in the two 


preceding columns. 


* College E used Ohio State Psychological Examination, other colleges used American Council Psycho- 


logical Examination. 


that the two measures differ little in predict- 
ing achievement in colleges with somewhat 
diverse standards of achievement. The Com- 
prehensive Test is therefore essentially as 
effective a predicter of first-year college 
achievement as the Regents average now 
being used. 

Comparison of columns 3, 6 and 7 in 
Table II reveals that ability of the Compre- 
hensive Test to predict college achievement is 
largely attributable to the ability of the 
objective portion of the test to predict such 
achievement. To phrase the conclusion as 
before, the objective portion of the Compre- 
hensive Test is essentially as effective a pre- 
dicter of first-year college achievement as the 
total test. Implications of this finding are 
complex, involving considerations of exami- 
nation policy that extend beyond the primary 
function of a scholarship test to select schol- 
arship candidates. 

Comparison of columns 6 and 8 shows 
that the objective portion of the Comprehen- 
sive Test yields consistently higher correla- 
tion with first-year achievement than does 
the standard psychological examination used 
in several of the colleges, indicating that the 
objective portion of the test probably taps 
aspects of fitness for college not tapped by 
psychological examinations. 

As a further check on the relative validity 
of the Comprehensive Test and the Regents 


average, all colleges that had supplied data 
on first-year achievement furnished a second 
average for each student who remained in 
college to the end of the second year in June 
1942. College D reported each student’s two- 
year average, other colleges offered a second- 
year average. Table III repeats the data of 
Table II on correlations with first-year 
achievement and gives in parallel columns 
the correlations with second-year achieve- 
ment. The differences found in the second- 
year groups are still negligible, although 
somewhat more consistently in favor of 
Regents averages than in the first-year 
study. Standard errors were not calculated 
for the differences in correlations with 
second-year achievement because in all cases 
they would exceed the values found for the 
first-year groups and the differences are sta- 
tistically umreliable even when judged 
against first-year standards. 

Item analysis of the 483 objective items 
of the Comprehensive High School Test, 1940 
Form, against the criterion of score on the 
objective portion of the test revealed that 
only eight items were statistically invalid, 
while most showed ye” discriminating 
power between superior and inferior candi- 
dates. Analysis of these eight items indicates 
that checks at the disposal of the constructors 
of the test can reduce the number of such 
questions in subsequent tests to less than 
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TABLE III 
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CORRELATIONS OF COMPREHENSIVE TEST AND REGENTS AVERAGE WITH ACHIEVEMENT IN 


THE FIRST AND SECOND YEARS OF COLLEGE, 1940-42 


College 


8 


A 156 .49 . 52 
B 70 . 45 49 
Cc 64 . 67 
D? 48 . 52 . 56 
E 42 . 39 . 50 
F 35 . 29 . 05 
G 33 . 62 62 
Women 
A 89 . 68 63 
B .29 87 
D? 25 . 56 . 87 
E 44 48 
F 104 . 34 
H 72 . 54 . 54 


+. 05 148 -45 41 
53 47 
+.07 63 87 46 
+.09 45 . 57? 63? 
+.14 37 30 a4 
+.19 30 .18 25 
+.10 28 - 54 66 
+. 06 85 55 54 
+.09 85 35 46 
+.13 19 . 53? 87? 
42 34 32 
+=. 09 90 36 41 
+. 08 64 44 51 


1 The standard errors of the differences in the two preceding columns. 


2 Reported two-year average second year. 


1%. A quick preliminary scoring and item 
analysis of 500 answer sheets will suffice to 
identify these few items so that they may be 
disregarded when scoring the test f ~ schol- 
arship purposes. 

Analysis of the distributions of scores of 
3006 scholarship candidates on the nine* dis- 
tinguishable sub-sections of the objective 
portion of the 1940 Comprehensive Test re- 
vealed that, with one exception, all sections 
of the test yielded satisfactory distributions 
with median scores from 55% to 71% of the 
possible totals. The reading section produced 
a skewed distribution with the median score 
at 84% of the possible total of 25. The ex- 
perience with this section, which had been 
designed to present the difficulties involved 
in advanced vocabulary, complexity of sen- 
tence structure, and complexity and abstract- 
ness of content, leads to the conclusion that 
questions based on the content of what is 
read are inadequate for testing scholarship 
candidates. In reading sections of subsequent 
forms of this test the questions must chal- 
lenge the reader to apply to what is read a 
background of factual information and un- 
derstanding of principles. 

Significantly different patterns of achieve- 
ment in the several sub-sections of the test 
were made by the various individuals tested. 
The value of this type of test for guidance 
and placement purposes has already been 
* See Table I for breakdown. 


demonstrated by a variety of testing services. 
The possibility of such use is an incidental 
value of this test. 


SUMMARY AND IMPLICATIONS 


1. The Comprehensive High School Test, 
1940 Form, is essentially as effective a pre- 
dicter of general academic achievement in 
New York State colleges as is the Regents 
average currently used. This finding war- 
rants the substitution of the Comprehensive 
Test for the Regents average as the basis of 
scholarship award beginning in 1944, in view 
of other advantages achieved by the change. 

2. The objective portion of the Compre- 
hensive High School Test is essentially as 
effective a predicter of general academic 
achievement in New York State colleges as 
is the test as a whole, including the essays. 
The essay portion of the test, as rated, con- 
tributes little to prediction of general aca- 
demic achievement in college. This finding 
confirms findings of other test constructors 
that objective tests are most effective in pre- 
dicting general scholastic achievement. The 
implications of the finding for the construc- 
tion of annual forms of the Comprehensive 
High School Test, however, are affected by 
the consideration that omission of all written 
expression from the test would leave an im- 
portant outtome of instruction untested. The 
essay portion of the test must probably be 
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retained and improved as far as possible. 
Several variations on the practice followed in 
1940 need to be considered. 


3. The practice followed in 1940 in 


assembling questions for the objective por- 
tion of the Comprehensive High School Test 
has proved highly satisfactory in producing 
valid, discriminating items. The few invalid 
items that find their way into any subsequent 
test can be identified and removed from the 
scoring by analysis of a small sample at the 
start 


4. A more difficult and searching reading 
section is required. Rather than depart from 
the present form in the direction of exer- 
cises calling for greater sharpness and ingenu- 
ity in interpreting what is read, the departure 
may well proceed in the direction of requir- 
ing application of a background of facts and 
principles to the content of what is read. 

5. The Comprehensive High School Test 
taps aspects of competence for college admis- 
sion not tapped by psychological examina- 
tions. The differences between the correla- 
tions of the two types of test with first-year 
college achievement probably reflect the 
effect of including in the Comprehensive 
High School Test questions calling for im- 

portant information as well as skill. 
‘6. The Comprehensive High School Test 
gives recognition to comprehensive under- 
standing of the world in which the candidate 
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lives and to competence in the intellectual 
skills necessary to operate effectively in it at 
the post-high school level. It is a comprehen- 
sive achievement test that is not simply a 
composite of tests of achievement in separate 
subjects recently studied, nor yet does it give 
great weight to aptitude apart from its appli- 
cation to mastery of basic skill and signifi- 
cant information. It is, therefore, neither an 
achievement test nor an aptitude test in the 
ordinary sense. As, rather, a test of general 
intellectual competence accumulated and 
maintained for use it may well be extended 
in the direction of testing ability to apply to 
practical situations such higher mental 
processes as recognition of consistency be- 
tween ideas, recognition of relevancy and 
sufficiency of proofs, syllogistic reasoning, 
appreciation of the fine arts, including liter- 
ature, and application of generalizations and 
principles to new situations. Also, as the 
core-curriculum is extended, as is being done 
in the New York State social studies pro- 
gram, the scope of information and skill to 
be tested will be widened. Always, the cri- 
terion for inclusion of a question in the test 
will be that accomplishment of the task set 
by the question shall tend to reflect effective 
possession of significant mental skill, under- 
standing, appreciation or information ac- — 
quired in the course of educational experience 
available to all. 
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